All generations.

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral') plt.title('Comparison of Nutrient and Mineral Content') plt.xlabel('Nutrient|Mineral') plt.ylabel('Amount') plt.legend() plt.show()

Python
View
generate functionWed, 21 Dec 2022

Change the class so that self.seniority defaults to 1 and self.intlawards = 2 class Designer(Employee): def __init__(self, name, seniority, awards): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority) + (self.intlawards*2) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority) + (self.intlawards*2) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

Python
View
generate functionMon, 26 Dec 2022

Write a code that looks for a repeating student number and replaces it with "9090". lst = [ ["0001", "Antonov", "Anton", "Igorevich", "08/20/2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def find_duplicates(lst): new_lst = [] for i in lst: if i[0] not in new_lst: new_lst.append(i[0]) else: i[0] = "9090" return new_lst find_duplicates(lst)

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incident_types with a value >= 5. Sort them in descending order by incident_type column

df_new['incident_type'].value_counts()[df_new['incident_type'].value_counts() >= 5].sort_values(ascending = False)

Python
View
generate functionTue, 13 Dec 2022

A string with words and numbers separated by spaces (one space between words and/or numbers). The words consist only of letters. You need to check if there are three words in a row in the original string. For example, the line "start 5 one two three 7 end" has three words in a row. Input: A string with words (str). Output: Answer as a boolean expression (bool), True or False. Examples: checkio("Hello World hello") == True checkio("He is 123 man") == False checkio("1 2 3 4") == False checkio("bla bla bla bla") == True

def checkio(str): cnt = 0 for i in str.split(): cnt = cnt + 1 if i.isalpha() else 0 if cnt >= 3: return True return False

Python
View
generate functionThu, 19 Jan 2023

There is a Pandas dataframe: place decade perc_of_5star 0 1 2010-2020 2.300 1 2 1900-1910 1.379 2 3 1970-1980 1.179 3 4 2000-2010 1.176 4 5 1960-1970 1.133 build a horizontal barchart with perc_of_5star columns in descending order and decade values on the y-axis

import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv('data.csv') df.sort_values('perc_of_5star', inplace=True) df.reset_index(inplace=True) plt.hlines(y=df.decade, xmin=0, xmax=df.perc_of_5star, color='skyblue') plt.gca().invert_yaxis() plt.show()

Python
View
generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a "data" dataframe with the values from "sl" in the first column and "sw" in the second column. The values in the lists are contained in the following format: [[-0.90068117] [-1.14301691] [-1.38535265] [-1.50652052] [-1.02184904] [-0.53717756] [-1.50652052] [-1.02184904] [-1.74885626] [-1.14301691]]

import pandas as pd sl = [0.05, 0.4, 0.5, 0.95] sw = [0.7, 0.8, 0.3, 0.9] data = {"sl": sl, "sw": sw}

Python
View
generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def name_list(group, dct): lst = [] for key, value in dct.items(): if value[-1] == group: lst.append(value[0] + ' ' + value[1] + ' ' + value[2]) lst.sort() for i in range(len(lst)): print('{}. {}'.format(i + 1, lst[i])) name_list('BST161', dct)

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists and replaces them with replacing_num (but without touching the first one) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace(students, stnums): for student in students: if student[0] in stnums: student[0] = replacing_num return students

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). In your answer, specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers.

from math import sqrt def conf_interval(n, mean, sig, conf): z_value = abs(conf - mean) sq_n = sqrt(n) interval = z_value / sq_n return interval conf_interval(3, 1, 2, 1)

Python
View
generate functionTue, 13 Dec 2022

Write a function that outputs True if the letters in it are one after the other in the string, and False otherwise. def goes_after(word: str, first: str, second: str) -> bool: s = '' for i, let in enumerate(word): print (let[i] + let[i+1]) print(first + second) if let[i] + let[i-1] == first + second: return True else: return False Examples: goes_after("world", "o", "r") Result: True goes_after("world", "w", "r") Result: False

def goes_after(word, first, second): for i in range(len(word) - 1): if word[i] == first and word[i+1] == second: return True return False

Python
View
generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) for each NaN line in this column, find a non-NaN line which has similar values in the pulse or respiratory_rate columns (the difference range is 20%) 3) replace NaN with values from the corresponding rows

def fill_na(df, col_name='rectal_temp', range_thresh=0.2): col_vals = df[col_name] col_vals_notnull_ind = col_vals.notnull() col_vals_notnull_ind = col_vals_notnull_ind.values[:, np.newaxis] col_vals_notnull = col_vals.loc[col_vals_notnull_ind].values col_vals_notnull_idx = col_vals_notnull_ind.nonzero()[0] for i in range(0, col_vals.shape[0]): if not col_vals_notnull_ind[i]: for j in range(0, col_vals_notnull.shape[0]): if (col_vals_notnull[j] - col_vals_notnull[j] * range_thresh) < col_vals[i] < (col_vals_notnull[j] + col_vals_notnull[j] * range_

Python
View
generate functionMon, 26 Dec 2022

Convert the data so that it is represented as a dictionary in which the keys are student numbers and the values are a list of other student data. students = [ ["0001", "Antonov", "Anton", "Igorevich", "20.08.2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def convert_to_dict(students): students_dict = {} for student in students: students_dict[student[0]] = student[1:] return students_dict convert_to_dict(students)

Python
View
generate functionThu, 19 Jan 2023

There is a Pandas dataframe: index title rating year decade 0 0 "BLOW THE NIGHT!" Let's Spend the Night Togeth... 3.000000 1983 1980-1990 1 1 "Great Performances" Cats (1998) 2.896648 1998 1990-2000 2 2 #1 Cheerleader Camp (2010) 2.111111 2010 2010-2020 3 3 #Captured (2017) 3.750000 2017 2010-2020 4 4 #Female Pleasure (2018) 3.666667 2018 2010-2020 ... ... ... ... ... ... 58953 58953 …And the Fifth Horseman Is Fear (1965) 3.250000 1965 1960-1970 58954 58954 キサラギ (2007) 3.250000 2007 2000-2010 58955 58955 チェブラーシカ (2010) 3.166667 2010 2010-2020 58956 58956 貞子3D (2012) 2.375000 2012 2010-2020 58957 58957 줄탁동시 (2012) 4.000000 2012 2010-2020 Write a function that groups titles by decade and calculates the percentage of titles with a rating of 5.0 relative to all titles in that decade

def group_by_rating(df): grouped = df.groupby('decade') result = {} for decade, group in grouped: result[decade] = (group.rating == 5.0).sum() / len(group) return result

Python
View
generate functionThu, 23 Mar 2023

This code gives an error AttributeError: 'MultiComparison' object has no attribute 'plot_simultaneous' from statsmodels.stats.multicomp import pairwise_tukeyhsd tukey.plot_simultaneous(comparison_name="C") Fix it!

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import statsmodels.api as sm from statsmodels.formula.api import ols import statsmodels.stats.multicomp as multi %matplotlib inline data = pd.read_csv('/Users/josephfrasca/Coding/Python/Python-Practice/Stats/ToothGrowth.csv') data.head() data.head()

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists. If a match is found, replace with replacing_num (but only the 2nd value) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090' Example: In the list of stnums we see the value '4004'. It occurs twice in the list of students. We need to change the second one to replacing_num. The result should be this: [ ... ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ... ["9090", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def replace_sec(stnums, students, replacing_num): for stnum in stnums: num = 0 for student in students: if stnum == student[0]: num += 1 if num > 1: student[0] = replacing_num return students replace_sec(stnums, students, replacing_num)

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated. Merge the rows with duplicates and sort the name column in ascending order

#function that: There is a Pandas dataframe import pandas as pd def merge_df(df): df.drop_duplicates(subset = ["name"], keep = 'first', inplace = True) return df.sort_values(by = 'name') df = pd.read_csv('test_data.csv') merge_df(df)

Python
View
generate functionThu, 22 Dec 2022

The Collatz conjecture (also known as 3n+1 conjecture) is a conjecture that applying the following algorithm to any number we will always eventually reach one: [This is writen in pseudocode] if(number is even) number = number / 2 if(number is odd) number = 3*number + 1 #Task Your task is to make a function hotpo that takes a positive n as input and returns the number of times you need to perform this algorithm to get n = 1. #Examples hotpo(1) returns 0 (1 is already 1) hotpo(5) returns 5 5 -> 16 -> 8 -> 4 -> 2 -> 1 hotpo(6) returns 8 6 -> 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1 hotpo(23) returns 15 23 -> 70 -> 35 -> 106 -> 53 -> 160 -> 80 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1

def hotpo(n): steps = 0 while n > 1: if n % 2 == 0: n /= 2 else: n = 3 * n + 1 steps += 1 return steps

Python
View
generate functionWed, 08 Mar 2023

Add to the code below winning $1 on "000", "007" combinations for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1

def add_777999_555333_11110_77_7_00_0(a): if a == 777: return 200 elif a == 999: return 100 elif a == 555: return 50 elif a == 333: return 15 elif a == 111: return 10 elif a%10 == 7 and a//10%10 == 7: return 5 elif a%10 == 7: return 3 elif a//10%10 == 0 and a//100%10 == 0: return 2 elif a//10%10 == 0: return 1 else: return 0 def find_score(num_trials): money = 0 successes = 0 for i in range(num_trials): money += add_777999_555333_11110_77_7_00_0(np.random.randint(1, 1000)) successes += 1 return money, successes

Python
View
generate functionMon, 12 Dec 2022

Complete function saleHotdogs/SaleHotDogs/sale_hotdogs, function accepts 1 parameter:n, n is the number of hotdogs a customer will buy, different numbers have different prices (refer to the following table), return how much money will the customer spend to buy that number of hotdogs. Use the only ternary operator number of hotdogs price per unit (cents) if n < 5 then 100 if n >= 5 and n < 10 then 95 if n >= 10 90

def sale_hotdogs(n): return (n < 5) * (n * 100) + (n >= 5 and n < 10) * (n * 95) + (n >= 10) * (n * 90)

Python
View
generate functionWed, 08 Mar 2023

Upgrade the code so that a roll of 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090 will drop 1 dollar for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1

def add(a, b): return a + b add(1, 2)

Python
View
generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

def graph_difference(data, title): difference = data['nutri'] - data['mineral'] percentage = (difference / data['nutri']) * 100 plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral') plt.title(title, fontsize = 20) plt.grid(axis = 'y') for i in range(len(data)): plt.text(x = data.index[i] - 0.3, y = data['nutri'][i] + 0.1, s = str(round(percentage[i], 1)) + '%', size = 12) plt.show()

Python
View
generate functionFri, 23 Dec 2022

You have a bunch of red and blue marbles. To start the game you grab a handful of marbles of each color and put them into the bag, keeping track of how many of each color go in. You take turns reaching into the bag, guessing a color, and then pulling one marble out. You get a point if you guessed correctly. You've decided to write a function, guessBlue() to help automatically calculate whether you should guess "blue" or "red". The function should take four arguments: the number of blue marbles you put in the bag to start the number of red marbles you put in the bag to start the number of blue marbles pulled out so far (always lower than the starting number of blue marbles) the number of red marbles pulled out so far (always lower than the starting number of red marbles) guessBlue() should return the probability of drawing a blue marble, expressed as a float. For example, guessBlue(5, 5, 2, 3) should return 0.6.

def guess_blue(blue_start, red_start, blue_pulled, red_pulled): return (blue_start - blue_pulled) / (blue_start - blue_pulled + red_start - red_pulled)

Python
View
generate functionMon, 27 Mar 2023

There are variables X_train, X_test, y_train, y_test created through sklearn train_test_split. They contain text data in string format. Convert them to float numbers.

from sklearn import datasets import numpy as np X, y = datasets.load_svmlight_file('data.txt') X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) X_train = X_train.toarray() X_test = X_test.toarray() y_train = y_train.astype('float') y_test = y_test.astype('float')

Python
View
generate functionWed, 08 Feb 2023

def get_needed_posts(query): site = pd.DataFrame() for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles') for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True) Modify the Python function above so that if it finds a repeated title or link, it will not add it to the return

def get_needed_posts(query): site = pd.DataFrame(columns=['date', 'title', 'link']) for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles') for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') except: pass if link not in site.link.values: row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd

Python
View
generate functionSat, 04 Mar 2023

Add the x-axis value captions to the code of this graph and change its color to purple gdpdiff[['Country or region', 'GDP per capita']].plot(kind='bar')

gdpdiff = gdpdiff.dropna(subset=['GDP per capita']) gdpdiff = gdpdiff[gdpdiff['GDP per capita'] > 0] gdpdiff = gdpdiff.sort_values(by='GDP per capita') gdpdiff['Country or region'].replace({ 'United States': 'USA', 'United Kingdom': 'UK' }) gdpdiff.plot(x='Country or region', y='GDP per capita', kind='bar')

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the "timestamp" column for a given "userId".

import pandas as pd ratings = pd.read_csv('ratings.csv') print(ratings.head()) # Create a function which return the average lifetime of users def lifetime(group): return group.max() - group.min() lifetime_users = ratings.groupby('userId').agg(lifetime) print(lifetime_users) average_lifetime = lifetime_users['timestamp'].mean() print(average_lifetime) # output: # userId movieId rating timestamp # 0 1 31 2.5 1260759144 # 1 1 1029 3.0 1260759179 # 2 1 1061 3.0 1260759182 # 3 1 1129 2.0 1260759185 # 4 1 1172 4.0 1260759205 # timestamp # userId # 1 203560 # 2 866607 # 3 8

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Plot a bar chart with number_of_hits on the x-axis and performer on the y-axis.

import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv(filename) df.plot(kind='bar',x='performer',y='number_of_hits') plt.show()

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 450 (Can't Live Without Your) Love And Affection Nelson 1990-07-07 1 93 15 14.0 607 (Everything I Do) I Do It For You Bryan Adams 1991-06-29 1 53 9 8.0 748 (Hey Won't You Play) Another Somebody Done Som... B.J. Thomas 1975-02-01 1 99 17 16.0 852 (I Can't Get No) Satisfaction The Rolling Stones 1965-06-12 1 67 13 12.0 951 (I Just) Died In Your Arms Cutting Crew 1987-03-07 1 80 14 13.0 Create a new dfs dataframe, where the data format of the chart_debut column is changed from 1991-06-29 to 1991

import pandas as pd df = pd.read_csv('artist_song_chart_debut.csv') df['chart_debut'] = df['chart_debut'].apply(lambda x: str(x)[:4]) df

Python
View
generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that will create a new dataframe hot_years and leave in it only those lines where av_temp > 12

df_hot_years = df_ru[df_ru["av_temp"] > 12]["year"] " print(df_hot_years)

Python
View
generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

#Fix it! misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': X_test.iloc[index]['Message'], 'actual': y_test.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python
View
generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

def check_sample_size(conf = 0.95, delta = 0.05, sigsqr = 225): """ confidence level - уровень доверия delta - погрешность sigsqr - дисперсия """ p = 1 - ((1 - conf) / 2) z = norm.ppf(p) n = (z ** 2) * sigsqr / delta ** 2 return int(n) print(check_sample_size())

Python
View
generate functionSat, 25 Feb 2023

This code gives an error "expected string or bytes-like object" Refactor it: import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

def interval(n, mean, sig, conf): h = sig/(n**(1/2))*z return round(h*2)

Python
View
generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def function(x, y, z): return x + y function(1, 2)

Python
View
generate functionTue, 21 Mar 2023

Rewrite this code to calculate Wilcoxon statistics stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

# def my_mannwhitneyu(data, col, by): def my_mannwhitneyu(data, col1, col2): # select data for the two conditions # data1 = data[data[by] == 'gate_30'] # data2 = data[data[by] == 'gate_40'] data1 = data[data[col1] == col1] data2 = data[data[col2] == col2] # calculate Mann-Whitney U-statistics and p-value stat, p = mannwhitneyu(data1, data2) print('Statistics=%.3f, p=%.3f' % (stat, p)) return stat, p my_mannwhitneyu(data, 'gate_30', 'gate_40')

Python
View
generate functionWed, 08 Mar 2023

Write code that adds money +=1 on any number from the list of winnums money = 0 winnums = [777, 999, 555, 333, 111, 177, 277, 377, 477, 577, 677, 877, 977, 100, 200, 300, 400, 500, 600, 700, 800, 900, 110, 120, 130, 140, 150, 160, 170, 180, 190, 210, 220, 230, 240, 250, 260, 270, 280, 290, 310, 320, 330, 340, 350, 360, 370, 380, 390, 410, 420, 430, 440, 450, 460, 470, 480, 490, 510, 520, 530, 540, 550, 560, 570, 580, 590, 610, 620, 630, 640, 650, 660, 670, 680, 690, 710, 720, 730, 740, 750, 760, 770, 780, 790, 810, 820, 830, 840, 850, 860, 870, 880, 890, 910, 920, 930, 940, 950, 960, 970, 980, 990, 107, 117, 127, 137, 147, 157, 167, 177, 187, 197, 207, 217, 227, 237, 247, 257, 267, 277, 287, 297, 307, 317, 327, 337, 347, 357, 367, 377, 387, 397, 407, 417, 427, 437, 447, 457, 467, 477, 487, 497, 507, 517, 527, 537, 547, 557, 567, 577, 587, 597, 607, 617, 627, 637, 647, 657, 667, 677, 687, 697, 707, 717, 727, 737, 747, 757, 767, 787, 797, 807, 817, 827, 837, 847, 857, 867, 877, 887, 897, 907, 917, 927, 937, 947, 957, 967, 977, 987, 997, 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090]

def money_add(lst): for i in lst: if i >= 100: money += 1

Python
View
generate functionTue, 20 Dec 2022

Change the date_range function to return a list of all dates between start and end in list def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return [] Examples: Input: date_range('2022-12-01', '2022-12-05') Output: ['2022-12-01', '2022-12-02', '2022-12-03', '2022-12-04', '2022-12-05']

def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) while start <= end: lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return [] date_range('2022-12-01', '2022-12-05')

Python
View
generate functionWed, 22 Feb 2023

Modify the code below to have column captions for the x-axis values question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def bar_plot(question_text, title, y_title, name): question = df[question_text].value_counts() label = question.index counts = question.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text=title) fig.update_yaxes(title_text=y_title) fig.update_xaxes(title_text=name) fig.show() question_text = 'What is the most preferred working environment for you.' title = 'Какая рабочая среда для вас наиболее предпочтительна?' y_title = 'Ответы' name = 'Количество ответов' bar_plot(question_

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a function year_leaders that will build a new dataframe and leave 1 line of performer and hits (having a maximum number of num_of_hits) in each chart_debut value

def year_leaders(df): df = df.copy() df.drop(columns=['time_on_chart', 'consecutive_weeks', 'decade', 'num_of_hits'], inplace=True) years = df.chart_debut.unique() year_leaders_df = pd.DataFrame() for year in years: year_df = df[df.chart_debut == year] year_max_row = year_df.loc[year_df.num_of_hits.idxmax()] year_leaders_df = year_leaders_df.append(year_max_row, ignore_index=True) return year_leaders_df

Python
View
generate functionWed, 08 Mar 2023

Write code that adds money +=1 on any number from the list of winnums winnums = [777, 999, 555, 333, 111, 177, 277, 377, 477, 577, 677, 877, 977, 100, 200, 300, 400, 500, 600, 700, 800, 900, 110, 120, 130, 140, 150, 160, 170, 180, 190, 210, 220, 230, 240, 250, 260, 270, 280, 290, 310, 320, 330, 340, 350, 360, 370, 380, 390, 410, 420, 430, 440, 450, 460, 470, 480, 490, 510, 520, 530, 540, 550, 560, 570, 580, 590, 610, 620, 630, 640, 650, 660, 670, 680, 690, 710, 720, 730, 740, 750, 760, 770, 780, 790, 810, 820, 830, 840, 850, 860, 870, 880, 890, 910, 920, 930, 940, 950, 960, 970, 980, 990, 107, 117, 127, 137, 147, 157, 167, 177, 187, 197, 207, 217, 227, 237, 247, 257, 267, 277, 287, 297, 307, 317, 327, 337, 347, 357, 367, 377, 387, 397, 407, 417, 427, 437, 447, 457, 467, 477, 487, 497, 507, 517, 527, 537, 547, 557, 567, 577, 587, 597, 607, 617, 627, 637, 647, 657, 667, 677, 687, 697, 707, 717, 727, 737, 747, 757, 767, 787, 797, 807, 817, 827, 837, 847, 857, 867, 877, 887, 897, 907, 917, 927, 937, 947, 957, 967, 977, 987, 997, 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090]

for i in winnums: print(i) print(i +=1)

Python
View
generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def delete_rock_rows(df): ...

Python
View
generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy Add it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy

Python
View
generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions if key == 'salad': grams = 'quantity' * portions print(grams) cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions if key == 'salad': grams = 'quantity' * portions print(grams) cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

Python
View
generate functionTue, 20 Dec 2022

Change the date_range function to return a list of all dates between start and end instead of return interval from datetime import datetime as dt from datetime import timedelta as td def date_range(start_date, end_date): lst = [] If start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) interval = end - start return interval except: return 'Incorrect date format' date_range('2022-12-01', '2022-12-20') Translated with www.DeepL.com/Translator (free version)

from datetime import datetime as dt from datetime import timedelta as td def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) interval = end - start for i in range(interval.days + 1): lst.append(start.strftime('%Y-%m-%d')) start = start + td(days=1) return lst except: return 'Incorrect date format' date_range('2022-12-01', '2022-12-20')

Python
View
generate functionSat, 25 Feb 2023

There is a Pandas dataframe: loan_amnt int_rate 0 5000 10.65% 1 2500 15.27% 2 2400 15.96% 3 10000 13.49% 4 3000 12.69% 5 5000 7.90% 6 7000 15.96% You need to split the values of the int_rate column (in object data format) into categories. Write a function that creates a rate_group column and adds values to it as follows: 1) if int_rate > 15.00%, rate_group should be '>15' 2) if int_rate is between 10.00% and 15.00%, then rate_group should be '10-15'

def rate_group(value): if value > 15.00: return '>15' elif 10.00 <= value <= 15.00: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group) df

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a function year_leaders that will build a new dataframe and leave 1 line of performer and hits (having a maximum number of num_of_hits) in each chart_debut value

def year_leaders(df): df = df.groupby('chart_debut').apply(lambda x: x.nlargest(1, 'num_of_hits')) df = df.reset_index() df = df.drop(columns = ['level_1']) return df

Python
View
generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = 'quantity' * portions print(grams)

def multiply_dict(key): grams = 0 cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = cook_book.get('salad')[0].get('quantity') * 5 print(grams) multiply_dict('salad')

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists. If a match is found, only the 2nd value must be replaced by replacing_num stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace_stnum(stnums, students, replacing_num): for student in students: for stnum in stnums: if stnum in student: student[0] = replacing_num return students print(replace_stnum(stnums, students, replacing_num))

Python
View
generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

plt.plot(data.index, data['nutri'] - data['mineral'], color = 'red', linewidth = 2, label = 'difference') plt.xticks(data.index, data['year'], rotation = 90) plt.xlabel('Year') plt.ylabel('Profit') plt.legend() plt.show()

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 450 (Can't Live Without Your) Love And Affection Nelson 1990-07-07 1 93 15 14.0 607 (Everything I Do) I Do It For You Bryan Adams 1991-06-29 1 53 9 8.0 748 (Hey Won't You Play) Another Somebody Done Som... B.J. Thomas 1975-02-01 1 99 17 16.0 852 (I Can't Get No) Satisfaction The Rolling Stones 1965-06-12 1 67 13 12.0 951 (I Just) Died In Your Arms Cutting Crew 1987-03-07 1 80 14 13.0 Create a new dfs dataframe, where the data format of the chart_debut column is changed from 1991-06-29 to 1991

import pandas as pd # create a new dataframe dfs = df[['song','performer','chart_debut','peak_position','worst_position','time_on_chart','consecutive_weeks']] # convert the chart_debut to string dfs['chart_debut'] = dfs['chart_debut'].astype(str) # get the date date = dfs.chart_debut.str.split("-", n = 1, expand = True) # drop the chart_debut dfs.drop(columns =['chart_debut'], inplace = True) # insert the date dfs.insert(2,'chart_debut',date[0]) # show the top 5 rows print(dfs.head())

Python
View
generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = 'quantity' * portions print(grams)

def cook_book(): portions = 5 for key, value in cook_book.items(): for sub_dict in value: for v in sub_dict.values(): ingridient_name = v[0] quantity = v[1] measure = v[2] grams = quantity * portions print(grams) return grams

Python
View
generate functionSat, 28 Jan 2023

There is a Pandas dataframe: date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 2 2021-07-01 Yalta Skyline 15 12 38016 11655 8775 17586 3 2021-07-01 Alushta Breeze 36 25 75475 27972 21060 26443 Calculate the scalar value as a percentage for all columns using the following formula: profit / revenue

df['profit_perc'] = df['profit'] / df['revenue']

Python
View
generate functionThu, 15 Dec 2022

Refine the function below to handle errors 1) if shelf variable is not found in directories, it will show ERROR NO SUCH KEY 2) if doc variable is not found in directories, it will show ERROR NO SUCH VALUE def change_shelf(data): docnum = '11-2' shelf = '3' for key, val in data.items(): for doc in val: if doc == docnum: data[shelf].append(doc) data[key].remove(doc) print('OK') return data directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] }

def change_shelf(data, docnum, shelf): if not shelf in data: return'ERROR NO SUCH KEY' for val in data.values(): if docnum in val: val.append(docnum) print('OK') return data return 'ERROR NO SUCH VALUE' directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] }

Python
View
generate functionSat, 04 Mar 2023

Correct the code to correctly output the dataframe with this data: gdpdiff = pd.DataFrame({'Top1': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) gdpdiff

def compare_gdp_and_happiness(df1, df2): df_merge = df1.merge(df2, how='inner', left_on='Country or region', right_on='entity') df_merge = df_merge[['Country or region', 'GDP per capita', 'happinessScore']] df_merge['GDP per capita'] = df_merge['GDP per capita'].apply(pd.to_numeric) df_merge['happinessScore'] = df_merge['happinessScore'].apply(pd.to_numeric) df_merge_sort = df_merge.sort_values(by='happinessScore', ascending=False) #df_merge_sort df_merge_top_1 = df_merge_sort.head(20)[0:1] df_merge_top_1.rename(columns={'Country or region': 'Top1', 'GDP per capita': 'Top1 GDP', 'happinessScore

Python
View
generate functionFri, 23 Dec 2022

You have a bunch of red and blue marbles. To start the game you grab a handful of marbles of each color and put them into the bag, keeping track of how many of each color go in. You take turns reaching into the bag, guessing a color, and then pulling one marble out. You get a point if you guessed correctly. You've decided to write a function, guessBlue() to help automatically calculate whether you should guess "blue" or "red". The function should take four arguments: the number of blue marbles you put in the bag to start the number of red marbles you put in the bag to start the number of blue marbles pulled out so far (always lower than the starting number of blue marbles) the number of red marbles pulled out so far (always lower than the starting number of red marbles) guessBlue() should return the probability of drawing a blue marble, expressed as a float. For example, guessBlue(5, 5, 2, 3) should return 0.6.

def guessBlue(blue_start, red_start, blue_pulled, red_pulled): return (blue_start - blue_pulled)/(blue_start - blue_pulled + red_start - red_pulled)

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country 1 1743-12-01 NaN NaN Åland 2 1744-01-01 NaN NaN Åland 3 1744-02-01 NaN NaN Åland 4 1744-03-01 NaN NaN Åland 9 1744-08-01 NaN NaN Åland Create a new column year from date in which you convert all date values to years. Example: 1744-01-01 Date column: 1744

def get_year(date): return int(date[:4]) def get_month(date): return int(date[5:7]) def get_day(date): return int(date[8:]) df['year'] = df['date'].apply(get_year) df['month'] = df['date'].apply(get_month) df['day'] = df['date'].apply(get_day)

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Leave only 1 unique row in each 'song' column in case the 'peak_position' column has a value of 1

df[df['peak_position'] == 1].drop_duplicates(subset = 'song', keep = 'first')

Python
View
generate functionThu, 16 Feb 2023

There is a df Pandas dataframe: id member_id loan_amnt funded_amnt funded_amnt_inv term int_rate installment grade sub_grade 0 1077501 1296599 5000 5000 4975 36 months 10.65% 162.87 B B2 ... Computer 27.65 0.0 735.0 739.0 1.0 3.0 13648.0 83.7% 9.0 1 1077430 1314167 2500 2500 2500 60 months 15.27% 59.83 C C4 ... bike 1.00 0.0 740.0 744.0 5.0 3.0 1687.0 9.4% 4.0 Write a function that will loop through all the column names in this dataframe and apply try .astype('Int64') to them. In case of an error the function should just execute pass

def try_to_int(df): for col in df.columns: try: df[col] = df[col].astype('Int64') except: pass

Python
View
generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

import pandas as pd def split_data(data): # data is a list of tuples # return a tuple of four lists # X_train, X_test, y_train, y_test df = pd.DataFrame(data, columns=['X', 'y']) train, test= train_test_split(df, test_size=0.3, random_state=42) return (train['X'].tolist(), test['X'].tolist(), train['y'].tolist(), test['y'].tolist()) split_data([(1, 2), (3, 4), (5, 6)])

Python
View
generate functionThu, 19 Jan 2023

There is a Pandas dataframe: place decade perc_of_5star 0 1 2010-2020 2.300 1 2 1900-1910 1.379 2 3 1970-1980 1.179 3 4 2000-2010 1.176 4 5 1960-1970 1.133 build a horizontal barchart with perc_of_5star columns in descending order and decade values on the y-axis

import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame( {'place': [1, 2, 3, 4, 5], 'decade': ['2010-2020', '1900-1910', '1970-1980', '2000-2010', '1960-1970'], 'perc_of_5star': [2.3, 1.379, 1.179, 1.176, 1.133]}) df.sort_values(by='perc_of_5star', ascending=True).plot(kind='barh', x='decade', y='perc_of_5star', legend=False) plt.show()

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 3819 Imagine Dragons Radioactive, Natural, Believer, Thunder, On To... 2012 11101 9304.0 2674 Ed Sheeran Thinking Out Loud, Photograph, Sing, Don't, On... 2014 10685 10039.0 9590 The Weeknd Starry Eyes, Save Your Tears, Acquainted, Blin... 2022 9429 6713.0 5812 Mariah Carey Without You/Never Forget You, Don't Forget Abo... 1994 8735 6616.0 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0 Write a function that groups the Performer lines by decade (for example, 1996 is 1990-2000)

import pandas as pd import numpy as np df = pd.DataFrame({'name': ['Taylor Swift', 'Imagine Dragons', 'Ed Sheeran'], 'year': [2008, 2012, 2014]}) def decade_equals(year): return year // 10 * 10 df.groupby(decade_equals)['name'].apply(list)

Python
View
generate functionMon, 26 Dec 2022

There are two lists. The first (stnums) contains rows that are considered duplicates. The second list is a list of students. Write a function that replaces the values in the second list that match the first (but only the 2nd, 3rd value, and so on, not the first one). stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004

Python
View
generate functionSun, 18 Dec 2022

Replace re.split with regular split to get the same result (you can use the json library and the loads function) The code: f = open('purchase_log.txt', encoding='utf-8') import re purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

def split_func(a): import json import re a = json.loads(a.strip()) purchases = {} for i, line in enumerate(a): line = re.split(r",|:", str(line)) keys = line[1] values = line[3] purchases[keys] = values return purchases split_func(a)

Python
View
generate functionThu, 22 Dec 2022

Return a new array consisting of elements which are multiple of their own index in input array (length > 1). Some cases: [22, -6, 32, 82, 9, 25] => [-6, 32, 25] [68, -1, 1, -7, 10, 10] => [-1, 10] [-56,-85,72,-26,-14,76,-27,72,35,-21,-67,87,0,21,59,27,-92,68] => [-85, 72, 0, 68] [22] should equal [-6, 32, 25] [68] should equal [-1, 10] [11] should equal [-11] [-56] should equal [-85, 72, 0, 68] [28] should equal [38, -44, -99] [-1] should equal [-49, 8, -60, 35]

def multiple_of_index(arr): return [num for i, num in enumerate(arr) if num % i == 0 and i != 0]

Python
View
generate functionWed, 25 Jan 2023

There is a Pandas dataframe news_title 0 /world/ 1 /latest/ 2 /?updated=top 3 /politics/36188461-s-marta-zhizn-rossiyan-susc... 4 /world/36007585-tramp-pridumal-kak-reshit-ukra... 5 /science/36157853-nasa-sobiraet-ekstrennuyu-pr... 6 /video/36001498-poyavilis-pervye-podrobnosti-g... 7 /world/36007585-tramp-pridumal-kak-reshit-ukra... 8 /science/ 9 /sport/ Filter out this dataframe and leave only the url's with the news sctructure (containing 8 digits and heading) in it, using the str.contains method

import pandas as pd news = pd.read_csv('https://raw.githubusercontent.com/ml-mipt/ml-mipt/basic/homeworks/Lab1_python_basics/news_sample.csv') news[news['news_title'].str.contains('[0-9]{8}-[a-z]+')]

Python
View
generate functionMon, 26 Dec 2022

A matrix matrix is given. matrix = [ [0,1,2,4,8], [6,2,2,1,9], [3,3,3,3,3], [4,6,7,1,2], [5,7,3,4,0] ] Write code (in pure Python, no third-party libraries, no list comprehensions) that will transpose it (that is, make columns out of rows) into matrix_t: matrix_t = [ [0, 6, 3, 4, 5], [1, 2, 3, 6, 7], [2, 2, 3, 7, 3], [4, 1, 3, 1, 4], [8, 9, 3, 2, 0] ]

# First solution def transpose(matrix): new_matrix = [[] for _ in range(len(matrix[0]))] for i in range(len(matrix[0])): for j in range(len(matrix)): new_matrix[i].append(matrix[j][i]) return new_matrix

Python
View
generate functionThu, 09 Feb 2023

There is 'actors' Pandas dataframe: actorid name date_of_birth birth_city birth_country height_inches biography gender ethnicity networth 1002 1155 Jackie Chan 1954-04-07 Victoria Peak Hong Kong 68.0 Hong Kong's cheeky, lovable and best known fil... Male NaN 400000000.0 1387 69 Keanu Reeves 1964-09-02 Beirut Lebanon 73.0 Keanu Charles Reeves, whose first name means "... Male Lebanese 360000000.0 2252 141 Sean Connery 1930-08-25 Edinburgh UK 74.0 The tall, handsome and muscular Scottish actor... Male White 350000000.0 291 6 Bruce Willis 1955-03-19 Idar-Oberstein West Germany 72.0 Actor and musician Bruce Willis is well known ... Male White 250000000.0 Write a function that creates a dictionary with all the countries of North America, Western Europe. Then apply this function so that only the rows NOT corresponding to this dictionary are left in the dataframe.

def NAWE(country): NA = ['Canada', 'United States'] WE = ['United Kingdom', 'Germany', 'Netherlands'] if country in NA or country in WE: return False return True df = df[df['birth_country'].apply(NAWE)]

Python
View
generate functionTue, 21 Mar 2023

There are 3 lists: water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1] nutri = [1,2,4,6,5,6,7,5,4,5,6,7,4,3,5,5,6,5,4,3,5] mineral =[2,1,1,3,2,4,2,4,5,4,3,2,3,2,3,1,3,4,5,1,4] add them to df Pandas dataframe in the following format: index treatments value 0 0 A 25 1 1 A 30 2 2 A 28 3 3 A 36 4 4 A 29 5 0 B 45 6 1 B 55 7 2 B 29 8 3 B 56 9 4 B 40 10 0 C 30 11 1 C 29 12 2 C 33

import pandas as pd import numpy as np def func(a, b, c): a = pd.DataFrame(a) b = pd.DataFrame(b) c = pd.DataFrame(c) a['treatment'] = 'A' b['treatment'] = 'B' c['treatment'] = 'C' a['index'] = np.arange(len(a)) b['index'] = np.arange(len(b)) c['index'] = np.arange(len(c)) a = a.rename(columns={0: "value"}) b = b.rename(columns={0: "value"}) c = c.rename(columns={0: "value"}) a = a[['index', 'treatment', 'value']] b = b[['index', 'treatment', 'value']] c = c[['index', 'treatment', 'value']] d =

Python
View
generate functionFri, 23 Dec 2022

You have a bunch of red and blue marbles. To start the game you grab a handful of marbles of each color and put them into the bag, keeping track of how many of each color go in. You take turns reaching into the bag, guessing a color, and then pulling one marble out. You get a point if you guessed correctly. You've decided to write a function, guessBlue() to help automatically calculate whether you should guess "blue" or "red". The function should take four arguments: the number of blue marbles you put in the bag to start the number of red marbles you put in the bag to start the number of blue marbles pulled out so far (always lower than the starting number of blue marbles) the number of red marbles pulled out so far (always lower than the starting number of red marbles) guessBlue() should return the probability of drawing a blue marble, expressed as a float. For example, guessBlue(5, 5, 2, 3) should return 0.6.

def guessBlue(blueStart, redStart, bluePulled, redPulled): return (blueStart - bluePulled) / (blueStart - bluePulled + redStart - redPulled) guessBlue(5, 5, 2, 3)

Python
View
generate functionSun, 18 Dec 2022

Replace re.split with regular split to get the same result (you can use the json library and the loads function) The code: f = open('purchase_log.txt', encoding='utf-8') import re purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

def load_json(line): import json from json.decoder import JSONDecodeError try: line = json.loads(line) except JSONDecodeError: line = {} return line f = open('purchase_log.txt', encoding='utf-8') purchases = {} for i, line in enumerate(f): line = line.strip().split(',') keys = line[1] values = line[3] purchases[keys] = values

Python
View
generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

import numpy as np import pandas as pd import scipy.stats as st import matplotlib.pyplot as plt %matplotlib inline np.random.seed(1) A = np.random.normal(0, 1, 50) B = np.random.normal(1, 1, 50)

Python
View
generate functionSat, 25 Feb 2023

There is a Pandas dataframe: loan_amnt int_rate 0 5000 10.65% 1 2500 15.27% 2 2400 15.96% 3 10000 13.49% 4 3000 12.69% 5 5000 7.90% 6 7000 15.96% You need to split the values of the int_rate column into categories. Important: The int_rate column is in str data format. Write a function that will create the rate_group column and add values to it as follows: 1) if int_rate > '15.00%', rate_group should be '>15' 2) if int_rate is between '10.00%' and '15.00%', then rate_group should be '10-15' You can use regex

def rate_group(x): if x > '15.00%': return '>15' if (x >= '10.00%') & (x <= '15.00%'): return '10-15' df['rate_group'] = df['int_rate'].apply(lambda x: rate_group(x))

Python
View
generate functionMon, 12 Dec 2022

Create a program that will take two lists of integers, a and b. Each list will consist of 3 positive integers above 0, representing the dimensions of cuboids a and b. You must find the difference of the cuboids' volumes regardless of which is bigger. For example, if the parameters passed are ([2, 2, 3], [5, 4, 1]), the volume of a is 12 and the volume of b is 20. Therefore, the function should return 8. Write solution in one line of code.

def find_difference(a, b): return abs(reduce(lambda x,y: x*y, a) - reduce(lambda x,y: x*y, b))

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe named 'dfp': performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a function year_leaders that will build a new dataframe and leave 1 line of performer and hits (having a maximum number of num_of_hits) in each chart_debut value

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type. Sort them in descending order by incident_type column

df_new.groupby(['route', 'incident_type']).size().sort_values(ascending=False).head()

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Pandas dataframe is available: song artist chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding (unique) values to that column are written through the "," sign into a new hits column.

def group_by_performer(df): # make a copy of df df_copy = df.copy() # sort by performer df_copy.sort_values(by='performer', inplace=True) # group by song and take the unique performers df_copy['hits'] = df_copy.groupby('song')['performer'].transform(lambda x: ','.join(x.unique())) # drop duplicates and reset index df_copy.drop_duplicates(subset='performer', inplace=True) df_copy.reset_index(inplace=True) return df_copy group_by_performer(df)

Python
View
generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions if key == 'salad': grams = 'quantity' * portions print(grams) cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions if key == 'salad': grams = 'quantity' * portions print(grams) cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

Python
View
generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=label, y=counts, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

df.sort_values('name', ascending=False)

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write code with value_counts and sorting by decade column and rating

def sort_df(df): return df.groupby('decade').rating.value_counts().sort_index()

Python
View
generate functionTue, 13 Dec 2022

Given an array of numbers, check if any of the numbers are the character codes for lower case vowels (a, e, i, o, u). If they are, change the array value to a string of that vowel. Return the resulting array IN THE SAME ORDER! Example: Input: inp = [118, "u",120,121,"u",98,122,"a",120,106,104,116,113,114,113,120,106 ] Output: [118,117,120,121,117,98,122,97,120,106,104,116,113,114,113,120,106 ]

def vowel_2_index(inp): for i in range(len(inp)): if type(inp[i]) == int: if inp[i] == 97: inp[i] = "a" elif inp[i] == 118: inp[i] = "u" elif inp[i] == 105: inp[i] = "i" elif inp[i] == 111: inp[i] = "o" elif inp[i] == 101: inp[i] = "e" return inp inp = [118, "u",120,121,"u",98,122,"a",120,106,104,116,113,114,113,120,106 ] vowel_2_index(inp)

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type

#df_new.route.value_counts() def df_count(dataframe, column): return dataframe[column].value_counts() df_count(df_new, 'route')

Python
View
generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! Example: dish = 'salad' portions = 5 The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]}

def get_ingredients(dish, portions): recipes = cook_book.get(dish, []) if not recipes: print('No such dish') return print(f'You need to cook {dish} for {portions} portions') for recipe in recipes: print(f'{recipe["ingridient_name"]}: {recipe["quantity"] * portions} {recipe["measure"]}')

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 3819 Imagine Dragons Radioactive, Natural, Believer, Thunder, On To... 2012 11101 9304.0 2674 Ed Sheeran Thinking Out Loud, Photograph, Sing, Don't, On... 2014 10685 10039.0 9590 The Weeknd Starry Eyes, Save Your Tears, Acquainted, Blin... 2022 9429 6713.0 5812 Mariah Carey Without You/Never Forget You, Don't Forget Abo... 1994 8735 6616.0 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0 Write a function divide_decade(df, decade) which, when a decade is specified, will perform the appropriate filter of the chart_debut column on the grouped Performer column. for example divide_decade(df, '1980-1990') 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0

import pandas as pd df = pd.DataFrame(data, columns=['performer', 'hits', 'chart_debut']) def divide_decade(df, decade): return df[(df['chart_debut'] >= int(decade[0:4])) & (df['chart_debut'] <= int(decade[5:9]))] divide_decade(df, '1980-1990')['performer']

Python
View
generate functionTue, 13 Dec 2022

Write a function that outputs True if the letters in it are one after the other in the string, and False otherwise. def goes_after(word: str, first: str, second: str) -> bool: s = '' for i, let in enumerate(word): print (let[i] + let[i+1]) print(first + second) if let[i] + let[i-1] == first + second: return True else: return False Examples: goes_after("world", "o", "r") Result: True goes_after("world", "w", "r") Result: False

def goes_after(word, first, second): for i in range(len(word)): if word[i] == first: if word[i+1] == second: return True else: return False else: return False

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times Write a function that groups unique movie titles in one column and displays their average rating in the second column

def get_movie_rating(): df = pd.read_csv(MOVIE_DATA_FILE_PATH) df2 = df.groupby(['title'])['rating'].mean().round(1) return df2 get_movie_rating()

Python
View
generate functionMon, 26 Dec 2022

Write a code that looks for a repeating student number and replaces it with "9090". lst = [ ["0001", "Antonov", "Anton", "Igorevich", "08/20/2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def search_and_replace(lst, new_id): for i in lst: if i[0] == new_id: i[0] = '9090' print(lst) search_and_replace(lst, '4004')

Python
View
generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

def train_test_split(X, y, test_size=0.3, random_state=42): """ Input: X: A pandas dataframe. y: A pandas series. Corresponds to the labels. test_size: Test size as a fraction of the total dataset. """ # YOUR CODE HERE raise NotImplementedError() return X_train, X_test, y_train, y_test

Python
View
generate functionMon, 27 Mar 2023

There is a dataframe df with two columns: Category and Message in the following format: Category Message 0 ham Go until jurong point, crazy... Available only ... 1 ham Ok lar... Joking wif u oni... 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 3 ham U dun say so early hor... U c already then say... 4 ham Nah I don't think he goes to usf, he lives aro... Split it into samples X = df['Message'] y = df['Category'] so that there are float numbers instead of object text everywhere.

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Python
View
generate functionWed, 08 Mar 2023

Upgrade the code so that a roll of 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090 will drop 1 dollar for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1

def upgrade(num): if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1 return money, successes

Python
View
generate functionThu, 23 Mar 2023

This code gives an error AttributeError: 'MultiComparison' object has no attribute 'plot_simultaneous' from statsmodels.stats.multicomp import pairwise_tukeyhsd tukey.plot_simultaneous(comparison_name="C") Fix it!

from statsmodels.stats.multicomp import pairwise_tukeyhsd tukey = pairwise_tukeyhsd(endog = df['A'], # Data groups = df['B'], # Groups alpha=0.05) # Significance level tukey.summary() fig = tukey.plot_simultaneous(comparison_name="C") fig.show()

Python
View
generate functionSun, 11 Dec 2022

A list of arbitrary length is given. It is necessary to write code, which on the basis of the original list will compose a nesting level dictionary of the same length as the original list. Examples of how the program works: my_list = ['2018-01-01', 'yandex', 'cpc', 100] Result: {'2018-01-01': {'yandex': {'cpc': 100}}} my_list = ['a', 'b', 'c', 'd', 'e', 'f'] Result: {'a': {'b': {'c': {'d': {'e': 'f'}}}}} Translated with www.DeepL.com/Translator (free version)

def make_dict(my_list): result = {} if len(my_list) > 2: result[my_list[0]] = make_dict(my_list[1:]) else: result[my_list[0]] = my_list[1] return result my_list = ['a', 'b', 'c'] print(make_dict(my_list))

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write a function that shows which decade had the most movies with a high rating Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def decade_of_the_movie(movie_rating) if movie_rating > 5: print("Most movies were in the decade") elif movie_rating <= 4: print("Most movies were in the decade") else: print("No movies were found") decade_of_the_movie(5)

Python
View
generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a "data" dataframe with the values from "sl" in the first column and "sw" in the second column. The values in the lists are contained in the following format: [[-0.90068117] [-1.14301691] [-1.38535265] [-1.50652052] [-1.02184904] [-0.53717756] [-1.50652052] [-1.02184904] [-1.74885626] [-1.14301691]]

import pandas as pd from pandas import DataFrame sl = [[-0.90068117], [-1.14301691], [-1.38535265], [-1.50652052], [-1.02184904], [-0.53717756], [-1.50652052], [-1.02184904], [-1.74885626], [-1.14301691]] sw = [[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.]] data = {'SL': sl, 'SW': sw} print(pd.DataFrame(data))

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

def duplicates(data): #your code here result = duplicates(data)

Python
View
generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def group_list(grnum): group_list = [] for key in dct: if grnum in key: group_list.append(' '.join(dct[key][0:3])) group_list.sort() return group_list group_list('BST161')

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 3 2021-07-01 Alushta Breeze 36 25 75475 27972 21060 26443 4 2021-07-01 Alushta Moreon 20 17 51091 15540 11700 23851 6 2021-07-02 Yalta Rapsodia 33 27 75276 25641 19305 30330 Write a lambda function with loc method that filters out all data and leaves only revenue > 40000 and total_rooms < 30

df.loc[(lambda x: x["revenue"] > 40000) & (lambda x: x["total_rooms"] < 30)]

Python
View
generate functionMon, 26 Dec 2022

Write a code that looks for a repeating student number and replaces it with "9090". lst = [ ["0001", "Antonov", "Anton", "Igorevich", "08/20/2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def check_id(lst): lst = list(set(lst)) count = 0 for i in lst: if i in lst[i + 1:]: print(i) count += 1 if count == 0: print("Non-repeating elements")

Python
View
generate functionTue, 21 Feb 2023

There is a Pandas dataframe: Last Updated Installs 7479 2010-05-21 100000 7430 2011-01-30 50000 10282 2011-03-16 100000 8418 2011-04-11 5000000 8084 2011-04-16 50000 9067 2011-04-18 50000 5144 2011-05-12 100 7237 2011-06-23 1000 10460 2011-06-26 5000 1544 2011-06-29 1000000 7080 2011-07-10 5000000 8200 2011-09-20 50000 5561 2011-09-22 1000000 Write a function that creates a 'year' column with only the year from the 'Last Updated' column (which contains dates in 'Timestamp' object format) added to it

data = [['2010-05-21',100000], ['2011-01-30',50000], ['2011-03-16',100000], ['2011-04-11',5000000], ['2011-04-16',50000], ['2011-04-18',50000], ['2011-05-12',100], ['2011-06-23',1000], ['2011-06-26',5000], ['2011-06-29',1000000], ['2011-07-10',5000000], ['2011-09-20',50000], ['2011-09-22',1000000]] df = pd.DataFrame(data, columns = ['Last Updated', 'Installs']) def date_to_year(a): return pd.to_datetime(a).year df['year'] = df.apply(lambda x: date_to_year(x['Last Updated']), axis=1)

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of "date" string values for all "operator"

# import pandas import pandas as pd # read csv file df_new = pd.read_csv('file.csv') # calculate mean of the "date" column for each "operator" df_new.groupby('operator')['date'].mean()

Python
View
generate functionThu, 02 Feb 2023

There is a Pandas dataframe: year date route operator group_name bus_garage bus_park injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Write a grouped_operators function that groups all values by operator column and sums all dates for each unique operator value

def grouped_operators(df): return df.groupby(["operator"])["date"].sum()

Python
View
generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function def how_much_water(L,X,N) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): return water * 1.1 ** (clothes - load)

Python
View
generate functionThu, 30 Mar 2023

There are two vectors: film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) We know that the angle between them is 65.9 degrees. Construct a graph with a visual representation of the vectors and the angle between them.

import matplotlib.pyplot as plt import numpy as np # we need 2 vectors film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) # We need to know how to plot vectors # We need to know how to get the angle between them # We need to know how to make a graph

Python
View
generate functionMon, 12 Dec 2022

In this simple exercise, you will build a program that takes a value, integer , and returns a list of its multiples up to another value, limit . If limit is a multiple of integer, it should be included as well. There will only ever be positive integers passed into the function, not consisting of 0. The limit will always be higher than the base. For example, if the parameters passed are (2, 6), the function should return [2, 4, 6] as 2, 4, and 6 are the multiples of 2 up to 6.

def find_multiples(integer, limit): return [integer*i for i in range(1, limit+1) if integer*i <= limit]

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 3 2021-07-01 Alushta Breeze 36 25 75475 27972 21060 26443 4 2021-07-01 Alushta Moreon 20 17 51091 15540 11700 23851 6 2021-07-02 Yalta Rapsodia 33 27 75276 25641 19305 30330 Write a lambda function with loc method that filters out all data and leaves only revenue > 40000 and total_rooms < 30

df.loc[lambda x: (x['revenue'] > 40000) & (x['total_rooms'] < 30)]

Python
View
generate functionMon, 12 Dec 2022

Complete function saleHotdogs/SaleHotDogs/sale_hotdogs, function accepts 1 parameter:n, n is the number of hotdogs a customer will buy, different numbers have different prices (refer to the following table), return how much money will the customer spend to buy that number of hotdogs. number of hotdogs price per unit (cents) n < 5 100 n >= 5 and n < 10 95 n >= 10 90

def sale_hotdogs(n): if n < 5: return n * 100 elif n < 10: return n * 95 else: return n * 90

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

df = pd.read_csv('customers.csv') df.sort_values(by='name', inplace=True) #df[(df['name'].duplicated(keep=False))].sort_values(by='name', inplace=True) df[(df['name'].duplicated(keep=False))].sort_values(by='name').head(10)

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of "date" string values for all "operator"

def mean_str_len(data, column1, column2): # data - a dataframe # column1 - a string, the name of a column # column2 - a string, the name of another column # YOUR CODE HERE ...

Python
View
generate functionWed, 08 Feb 2023

def get_needed_posts(query): site = pd.DataFrame() for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) time.sleep(0.3) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles-list__item') for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True) res = get_needed_posts(query) res Modify the Python function above so that if it finds a repeated title or link, it will not add it to the return

def get_needed_posts(query): site = pd.DataFrame() for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) time.sleep(0.3) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles-list__item') for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') if title not in site.title.values and link not in site.link.values: row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). In your answer, specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers.

def confidence_interval(n, mean, sig, conf): sig = sig / (n ** (1/2)) left = round(mean - sig * norm.ppf((1+conf)/2)) right = round(mean + sig * norm.ppf((1+conf)/2)) return right - left

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return The result of the function should be three values: 12, 28, 21

def interval(n, mean, sig, conf): h = sig * t.ppf((1 + conf) / 2, n - 1) return round(mean - h), round(mean + h), round(mean) interval(n=20, mean=3.3, sig=2.4, conf=0.95)

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: time_on_chart min max performer "Groove" Holmes 1 11 "Little" Jimmy Dickens 1 10 "Pookie" Hudson 1 1 "Weird Al" Yankovic 1 20 Sort these rows by the time_on_chart and max columns, in descending order, and display the first 20 rows

import pandas as pd df = pd.DataFrame({'time_on_chart': [1,1,1,1], 'max': [11,10,1,20]}, index=['"Groove" Holmes', '"Little" Jimmy Dickens', '"Pookie" Hudson', '"Weird Al" Yankovic']) df.sort_values(['time_on_chart', 'max'], ascending=False)

Python
View
generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) calculate the correlation of this column with other columns: pulse, respiratory_rate, extremities_temp, pain_outcome 3) fill all NaNs in the 'rectal_temp' column with the average values multiplied by the correlation weight

def fill_na_by_corr(df, col_name='rectal_temp'): df_temp = df[df.columns[df.isna().any()].tolist()].drop(['outcome'], axis = 1) df_temp = df_temp.dropna(subset = [col_name]) cor = df_temp.corr()[[col_name]].drop([col_name]) for i in cor.index: if cor.loc[i, col_name] > 0: df[i] = df[i].fillna(value = df[i].mean() * cor.loc[i, col_name]) else: df[i] = df[i].fillna(value = df[i].mean() / cor.loc[i, col_name]) return df

Python
View
generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

from scipy.stats import ttest_ind import numpy as np a = np.array([1,2,3,4,5]) b = np.array([5,6,7,8,9]) ttest_ind(a, b)

Python
View
generate functionThu, 22 Dec 2022

The Collatz conjecture (also known as 3n+1 conjecture) is a conjecture that applying the following algorithm to any number we will always eventually reach one: [This is writen in pseudocode] if(number is even) number = number / 2 if(number is odd) number = 3*number + 1 #Task Your task is to make a function hotpo that takes a positive n as input and returns the number of times you need to perform this algorithm to get n = 1. #Examples hotpo(1) returns 0 (1 is already 1) hotpo(5) returns 5 5 -> 16 -> 8 -> 4 -> 2 -> 1 hotpo(6) returns 8 6 -> 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1 hotpo(23) returns 15 23 -> 70 -> 35 -> 106 -> 53 -> 160 -> 80 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1

def collatz(number): if number % 2 == 0: print(number // 2) return number // 2 elif number % 2 == 1: result = 3 * number + 1 print(result) return result n = input("Enter number: ") while n != 1: n = collatz(int(n))

Python
View
generate functionMon, 20 Feb 2023

The 'Size' column of the Pandas dataframe contains data in the following format: Size 19M 14M 8.7M 25M 250k 429k 728k 3M Write a function that uses regex to convert strings: - that end in 'M' to a number with 5 zeros (e.g. 2.8M should be changed to 2800000) - that end in 'k' to a number with 3 zeros (for example, 429k should be changed to 429000)

import re def convert_size(size): if size[-1] == 'M': return int(float(size[:-1]) * 1000000) elif size[-1] == 'k': return int(float(size[:-1]) * 1000) else: return int(size) replaces = lambda x: convert_size(x) data['Size'] = data['Size'].apply(convert_size)

Python
View
generate functionWed, 25 Jan 2023

There is a list with lists inside [['/world/'], ['/latest/'], ['/?updated=top'], ['/politics/36188461-s-marta-zhizn-rossiyan-suschestvenno-izmenitsya-iz-za-novyh-zakonov/'] ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/'], ['/science/36157853-nasa-sobiraet-ekstrennuyu-press-konferentsiyu-na-temu-vnezemnoy-zhizni/'], ['/video/36001498-poyavilis-pervye-podrobnosti-gibeli-natali-melamed/'], ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/?smi2=1'] ['/science/'], ['/sport/'], ['/middleeast/36131117-divizion-s-400-ne-zametil-ataki-f-35-pod-damaskom/'], ['/economics/36065674-rossiyane-vozmutilis-minimalnymi-zarplatami-v-stranah-es/']] 1) Modify the list to the Pandas dataframe 2) Filter out and leave only the url's with the news sctructure (containing 8 digits and heading) in it, using the str.contains method

df[df['text'].str.contains('\d{8}-')]

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Add an "av_ltv" column to the dataframe, in which you calculate the average lifetime of the users. Lifetime refers to the difference between the maximum and minimum values of the timestamp column for a given userId value.

df = pd.DataFrame(data = [[1, 31, 2.5, 1260759144], [1, 1029, 3.0, 1260759179], [1, 1061, 3.0, 1260759182], [1, 1129, 2.0, 1260759185], [1, 1172, 4.0, 1260759205]], columns = ['userId', 'movieId', 'rating', 'timestamp']) def average_lifetime(df): ''' df: input dataframe ''' df_max = df.groupby(['userId']).max() df_min = df.groupby(['userId']).min() df_final = pd.merge(df_max, df_min, on = ['userId'], suffixes = ('_max', '_min')) df_final['average_lifetime'] = df_final['timestamp_max'] - df_final['timestamp_min'] return df_final df = average

Python
View
generate functionSat, 28 Jan 2023

There is seaborn graph code sns.factorplot(x='date', y='rating', col='hotel', data=hotels_rating, col_wrap=3, kind='bar') plt.suptitle('hotel rating change',size=16) plt.subplots_adjust(top=.925) Limit the height of the columns on the y-axis to 3

# Import packages import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Initialize the figure f, ax = plt.subplots(figsize=(6.5, 6.5)) # Load the example car crash dataset crashes = sns.load_dataset("car_crashes").sort_values("total", ascending=False) # Plot the total crashes sns.set_color_codes("pastel") sns.barplot(x="total", y="abbrev", data=crashes, label="Total", color="b") # Plot the crashes where alcohol was involved sns.set_color_codes("muted") sns.barplot(x="alcohol", y="abbrev", data=crashes, label="Alcohol-involved", color="b") # Add a legend and informative axis label ax.legend(ncol=2, loc="lower right", frameon=True) ax.set(x

Python
View
generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that converts dataframe columns to strings

df['country'] = df['country'].astype(str)

Python
View
generate functionMon, 12 Dec 2022

Create a program that will take two lists of integers, a and b. Each list will consist of 3 positive integers above 0, representing the dimensions of cuboids a and b. You must find the difference of the cuboids' volumes regardless of which is bigger. For example, if the parameters passed are ([2, 2, 3], [5, 4, 1]), the volume of a is 12 and the volume of b is 20. Therefore, the function should return 8.

def find_difference(a, b): return max(a) * max(b) find_difference([1, 2, 3], [4, 5, 6])

Python
View
generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy Add it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp)

Python
View
generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that will replace all NaN values in the rectal_temp column with the median of that column

# df2 Pandas dataframe is given def fillna_median(df2): df2['rectal_temp'] = df2['rectal_temp'].fillna(df2['rectal_temp'].median()) return df2 fillna_median(df2)

Python
View
generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that finds the maximum values of each column and transposes them to rows

def colmax(df): maxvalues = [] for col in df.columns: maxvalues.append(df[col].max()) return pd.DataFrame(maxvalues, index=df.columns).T

Python
View
generate functionTue, 20 Dec 2022

Change the date_range function to return a list of all dates between start and end instead of return somelist from datetime import datetime as dt from datetime import timedelta as td def date_range(start_date, end_date): lst = [] If start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) return somelist except: return 'Incorrect date format' date_range('2022-12-01', '2022-12-20') Translated with www.DeepL.com/Translator (free version)

from datetime import datetime as dt from datetime import timedelta as td def date_range(start_date, end_date): result = [] if start_date > end_date: return result else: while start_date <= end_date: result.append(start_date.strftime('%Y-%m-%d')) start_date += td(days=1) return result date_range('1992-09-01', '1992-09-04') # ['1992-09-01', '1992-09-02', '1992-09-03', '1992-09-04']

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Leave only 1 unique row in each 'song' column in case the 'peak_position' column has a value of 1

df.groupby('song').first().reset_index()

Python
View
generate functionSat, 25 Feb 2023

There is a Pandas dataframe: loan_amnt int_rate 0 5000 10.65% 1 2500 15.27% 2 2400 15.96% 3 10000 13.49% 4 3000 12.69% 5 5000 7.90% 6 7000 15.96% You need to split the values of the int_rate column into categories. Important: The int_rate column is in str data format. Write a function that will create the rate_group column and add values to it as follows: 1) if int_rate > 15.00%, rate_group should be '>15' 2) if int_rate is between 10.00% and 15.00%, then rate_group should be '10-15' You can use regex

import pandas as pd import re df = pd.DataFrame({'loan_amnt': [5000, 2500, 2400, 10000, 3000, 5000, 7000], 'int_rate': ['10.65%', '15.27%', '15.96%', '13.49%', '12.69%', '7.90%', '15.96%']}) df['rate_group'] = None def rate_group(a): if a > 15.00: return '>15' elif a >= 10.00 and a <= 15.00: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(lambda x: (re.sub('%', '', x))) df['rate_group'] = df['rate_group'].apply(lambda x: float(x)) df['rate_group'] = df['rate_group'].apply(rate_group) display(df)

Python
View
generate functionMon, 12 Dec 2022

Inspired by the development team at Vooza, write the function that accepts the name of a programmer, and returns the number of lightsabers owned by that person. The only person who owns lightsabers is Zach, by the way. He owns 18, which is an awesome number of lightsabers. Anyone else owns 0. Note: your function should have a default parameter. For example(Input --> Output): "anyone else" --> 0 "Zach" --> 18

def howManyLightSabersDoYouOwn(name="anyone else"): if name == "Zach": return 18 else: return 0

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) calculates the average temperature in countries (average av_temp by decade column grouped by country) 2) build a list of the 20 coldest countries in ascending av_temp order

def average_temperature(df): avg_temp_by_country = df.groupby(['countrry', 'decade'])['av_temp'].mean().reset_index() coldest_countries = avg_temp_by_country.sort_values('av_temp', ascending = True).head(20) coldest_countries_list = list(coldest_countries['country']) return avg_temp_by_country, coldest_countries_list

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name district injury_result incident_type victim_category victim_age 0 2015 01.01.2015 1 London General Go-Ahead Southwark 1 Onboard Injuries 7 10 1 2015 01.01.2015 4 Metroline Metroline Islington 1 Onboard Injuries 7 2 2 2015 01.01.2015 5 East London Stagecoach Havering 7 Onboard Injuries 7 8 3 2015 01.01.2015 5 East London Stagecoach None London Borough 7 Onboard Injuries 7 8 4 2015 01.01.2015 6 Metroline Metroline Westminster 5 Onboard Injuries 10 8 generate a unique id for each line of the pandas dataframe

import uuid df_new['id'] = df_new.apply(lambda row: uuid.uuid4(), axis=1)

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

from math import sqrt def interval(n, mean, sig, conf): t = 1.96 # for conf = 0.95 h = t * sig/sqrt(n) return round(h) interval(100, 6, 2, 0.95)

Python
View
generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = 'quantity' * portions print(grams) The result should be: Cheese: 210 g Tomatoes: 6 pcs. Cucumbers: 60g Olives: 30 g Olive oil: 60 ml Lettuce: 30 gr Pepper: 60 gr

def multiply_cook_book(portions): new_book = {} for key, value in cook_book.items(): new_book[key] = [] for d in value: new_book[key].append({'ingridient_name': d['ingridient_name'], 'quantity': d['quantity'] * portions, 'measure': d['measure']}) return new_book print(multiply_cook_book(2))

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe in this format: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singing in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 The Seventh Seal (1957) 5.0 9 Bridge Over the River Kwai (1957) 4.0 Write a function that will pull the year of the movie from its title and add it to the "year" column

def get_year(row): year = row['title'].extract() return "year" df.apply(get_year)

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe: hotel how_find_us Alpina agg_101hotels.com 2 agg_airbnb.com 4 agg_booking.com 3 agg_bronevik.com 1 agg_hotellook.ru 4 agg_level.travel 2 agg_onetwotrip.com 2 agg_onlinetours.ru 2 agg_ostrovok.ru 1 agg_other_sites 1 agg_roomguru.ru 2 agg_sutochno.ru 6 agg_travelata.ru 3 agg_tripadvisor.ru 3 agg_trivago.com 6 agg_tvil.ru 3 agg_yandex_travel 5 by_recommendation 8 facebook_adv 1 google_adv 1 instagram_adv 6 outdoor 6 regular_customer 1 seo 5 social 3 telegram_adv 2 tour_agents 3 unknown 4 vk_adv 3 yandex_adv 10 Write a function that creates a new dataframe by grouping the values in the "how_find_us" column according to the following rules: 1) if there are "agg" in the row, replace all row values with "aggregators" 2) if there are words "facebook" or "vk" or "instagram" or "telegram" in the line, replace these values with "social"

import pandas as pd def find_us(val): if val.startswith('agg'): val = 'aggregators' elif val.startswith('vk_adv') or val.startswith('facebook_adv') or val.startswith('instagram_adv') or val.startswith('telegram_adv'): val = 'social' return val df = pd.read_csv('/datasets/how_find_us.csv') df['how_find_us'] = df['how_find_us'].apply(find_us) df['how_find_us'].value_counts()

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new data frame and leave only the rows that have the maximum value of num_of_hits in it when grouped by the chart_debut column

def year_leaders(df): leders = df.groupby(by=['chart_debut']).agg({'num_of_hits': 'max'}).reset_index() print(leders) return df.merge(leders, left_on=['chart_debut', 'num_of_hits'], right_on=['chart_debut', 'num_of_hits'], how='inner') year_leaders(df)

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Pandas dataframe is available: song artist chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding (unique) values to that column are written through the "," sign into a new hits column. All values of both time_on_chart and consecutive_weeks columns must be summed if the performer value matches.

def group_by_performer(df): pass

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users who gave more than 100 ratings. The lifetime is the difference between the maximum and minimum values of the timestamp column for a given userId.

data = loadmovies() pdata = pd.DataFrame(data) grouped = pdata.groupby('userId')['timestamp'].agg(['max', 'min']) grouped['diff'] = grouped['max'] - grouped['min'] grouped[pdata.groupby('userId')['rating'].count() > 100].mean()

Python
View
generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': X_test[index], 'actual': y_test[index], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python
View
generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

import math class Point: def __init__(self, x, y): self.x = x self.y = y def distance(a, b): return math.sqrt((a.x - b.x)**2 + (a.y - b.y)**2) test.assert_approx_equals(distance(Point(0,0), Point(0,1)), 1) test.assert_approx_equals(distance(Point(0,0), Point(1,0)), 1) test.assert_approx_equals(distance(Point(0,0), Point(3,4)), 5)

Python
View
generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): if clothes <= load: return water else: return water * 1.1 ** (clothes - load)

Python
View
generate functionMon, 26 Dec 2022

There are 2 functions. One looks for duplicate values and makes a dictionary. The second changes the duplicates to the desired value. The first function (find_duplicates) works correctly. But the second function (change_duplicates) is incomplete. Finish the code so that the duplicate student number changes to the one in brackets in the change_duplicates function. def find_duplicates(lst): stnums = [] cnt = {} for el in lst: stnums.append(el[0]) for i in stnums: if i in cnt: cnt[i] += 1 else: cnt[i] = 1 return cnt def change_duplicates(lst, replacing_num): stnums = [] duplicates = find_duplicates(lst) for k, v in duplicates.items(): if v > 1: stnums = [k] for student in lst: if stnums in lst: #need right code#

def find_duplicates(lst): stnums = [] cnt = {} for el in lst: stnums.append(el[0]) for i in stnums: if i in cnt: cnt[i] += 1 else: cnt[i] = 1 return cnt def change_duplicates(lst, replacing_num): stnums = [] duplicates = find_duplicates(lst) for k, v in duplicates.items(): if v > 1: stnums = [k] for student in lst: if stnums in lst: #need right code#

Python
View
generate functionFri, 23 Dec 2022

You have a bunch of red and blue marbles. To start the game you grab a handful of marbles of each color and put them into the bag, keeping track of how many of each color go in. You take turns reaching into the bag, guessing a color, and then pulling one marble out. You get a point if you guessed correctly. You've decided to write a function, guessBlue() to help automatically calculate whether you should guess "blue" or "red". The function should take four arguments: the number of blue marbles you put in the bag to start the number of red marbles you put in the bag to start the number of blue marbles pulled out so far (always lower than the starting number of blue marbles) the number of red marbles pulled out so far (always lower than the starting number of red marbles) guessBlue() should return the probability of drawing a blue marble, expressed as a float. For example, guessBlue(5, 5, 2, 3) should return 0.6.

def guess_blue(blue_start, red_start, blue_pulled, red_pulled): total_start = blue_start + red_start total_pulled = blue_pulled + red_pulled return (blue_start - blue_pulled) / (total_start - total_pulled)

Python
View
generate functionMon, 27 Mar 2023

There is a dataframe df with two columns: Category and Message in the following format: Category Message 0 ham Go until jurong point, crazy... Available only ... 1 ham Ok lar... Joking wif u oni... 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 3 ham U dun say so early hor... U c already then say... 4 ham Nah I don't think he goes to usf, he lives aro... Split it into samples X = df['Message'] y = df['Category'] so that there are float numbers instead of object text everywhere.

import pandas as pd import numpy as np df = pd.DataFrame({'Category':['ham','ham','spam','ham','ham'], 'Message':['Go until jurong point, crazy... Available only ...','Ok lar... Joking wif u oni...','Free entry in 2 a wkly comp to win FA Cup fina...','U dun say so early hor... U c already then say...','Nah I don\'t think he goes to usf, he lives aro...']}) df['Category'] = df['Category'].map(lambda x : 1 if x == 'ham' else 0)

Python
View
generate functionThu, 23 Mar 2023

Write code in Python that downloads all the names of the text files in the folder: http://stup.pro/wp-content/uploads/2023/03/*.txt into one list book_files

import requests def get_all_files_names(url): links = [] response = requests.get(url) html_doc = response.text soup = BeautifulSoup(html_doc, 'html.parser') for link in soup.find_all('a'): links.append(link.get('href')) links = list(filter(lambda x: x.endswith('.txt'), links)) return links get_all_files_names('http://stup.pro/wp-content/uploads/2023/03/')

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type

df_new.groupby(['route', 'incident_type']).size()

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) leave only the rows in the year column with a value higher than 1980 2) calculates the average temperature of the countries (the average of the av_temp column) 3) builds a list of the 20 coldest countries in ascending av_temp order

def filter_1980(df): df["year"] = df["year"].apply(lambda x: x > 1980) return df def avg_temp(df): return df["av_temp"].mean() def list_coldest(df, n=20): df = df.sort_values(by="av_temp") return list(df.sample(n)["country"]) new_df = filter_1980(df) avg_temp(new_df) list_coldest(new_df)

Python
View
generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

def test_result(drugA, drugB): return

Python
View
generate functionSat, 18 Feb 2023

There are 2 charts in Python: df.plot(kind='scatter', x='User_Score', y='Global_Sales', plt.ylim(0, 40) df.plot(kind='scatter', x='Critic_Score', y='Global_Sales', plt.ylim(0, 40) Rewrite the code so that the two charts are not separate but are subplots.

import matplotlib.pyplot as plt fig, axes = plt.subplots(nrows=1, ncols=2) df.plot(kind='scatter', x='User_Score', y='Global_Sales', ax=axes[0]) df.plot(kind='scatter', x='Critic_Score', y='Global_Sales', ax=axes[1]) plt.show()

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 We need to find customers who have contacted us several times and make a separate table with them. That is, we need to find duplicate values in the name column and leave only them, all the rest removed. Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

# pandas dataframe df = pd.DataFrame({'name': ['Ksenia Rodionova', 'Ulyana Selezneva', 'Konstantin Prokhorov', 'Petrov Vladimir', 'Arina Selivanova', 'Artur Petrov', 'Ivan Sidorov', 'Ksenia Rodionova', 'Ksenia Rodionova'], 'date': ['2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01']}) df.head(10) df_new = df.loc[df.duplicated(subset=['name'], keep=False), :] df_new

Python
View
generate functionWed, 22 Feb 2023

Modify this code to make a vertical bar graph instead of a pie chart (plotly.express library) question6 = "How likely would you work for a company whose mission is not bringing social impact ?" question6 = data[question6].value_counts() label = question6.index counts = question6.values colors = ['gold','lightgreen'] fig = go.Figure(data=[go.Pie(labels=label, values=counts)]) fig.update_layout(title_text='How likely would you work for a company whose mission is not bringing social impact?') fig.update_traces(hoverinfo='label+value', textinfo='percent', textfont_size=30, marker=dict(colors=colors, line=dict(color='black', width=3))) fig.show()

def vertical_bar_chart(question): question = data[question].value_counts() label = question.index counts = question.values colors = ['gold', 'lightgreen'] fig = go.Figure(data=[go.Bar(x=label, y=counts, marker_color=colors)]) fig.update_layout(title_text=question) fig.show() vertical_bar_chart(question6)

Python
View
generate functionSun, 18 Dec 2022

Replace re.split with regular split to get the same result (you can use the json library and the loads function) The code: f = open('purchase_log.txt', encoding='utf-8') import re purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

import json def normal_split(data): return data.split(',') purchases = {} for i, line in enumerate(f): line = json.loads(line.strip()) keys = line['user_id'] values = line['category'] purchases[keys] = values

Python
View
generate functionMon, 26 Dec 2022

Write a code that looks for a repeating student number and replaces it with "9090". lst = [ ["0001", "Antonov", "Anton", "Igorevich", "08/20/2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def replace_number(lst): for row in lst: if row[0] == "4004": row[0] = "9090" return lst

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write a function that shows which decade had the most movies with a high rating Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def decade(year): if year >= 1900 and year <= 1910: return "1900-1910" elif year > 1910 and year <= 1920: return "1910-1920" elif year > 1920 and year <= 1930: return "1920-1930" elif year > 1930 and year <= 1940: return "1930-1940" elif year > 1940 and year <= 1950: return "1940-1950" elif year > 1950 and year <= 1960: return "1950-1960" elif year > 1960 and year <= 1970: return "1960-1970" elif year > 1970 and year <= 1980: return "1970-1980" elif year > 1980 and year <= 1990: return "1980-1990" elif year > 1990 and year <= 2000: return "1990-2000" elif year > 2000 and year <= 2010: return "2000-2010" elif year > 2010 and year <= 2020: return "2010-2020

Python
View
generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

def split_data(df, y, test_size=0.2, random_state=42): X = df y = y X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state) return X_train, X_test, y_train, y_test split_data(tfidf, cats)

Python
View
generate functionSat, 25 Feb 2023

This code gives an error "expected string or bytes-like object" Refactor it: import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

Python
View
generate functionSun, 25 Dec 2022

Task Give you two strings: s1 and s2. If they are opposite, return true; otherwise, return false. Note: The result should be a boolean value, instead of a string. The opposite means: All letters of the two strings are the same, but the case is opposite. you can assume that the string only contains letters or it's a empty string. Also take note of the edge case - if both strings are empty then you should return false/False. Examples (input -> output) "ab","AB" -> true "aB","Ab" -> true "aBcd","AbCD" -> true "AB","Ab" -> false "","" -> false

def is_opposite(s1, s2): if not s1 and not s2: return False if s1 and s2: return s1.swapcase() == s2 return False

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 3819 Imagine Dragons Radioactive, Natural, Believer, Thunder, On To... 2012 11101 9304.0 2674 Ed Sheeran Thinking Out Loud, Photograph, Sing, Don't, On... 2014 10685 10039.0 9590 The Weeknd Starry Eyes, Save Your Tears, Acquainted, Blin... 2022 9429 6713.0 5812 Mariah Carey Without You/Never Forget You, Don't Forget Abo... 1994 8735 6616.0 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0 Write a function num_of_hits that will sort the dataframe by the number of values (separated by commas) in the hits column

def num_of_hits(df): df['num_of_hits'] = df.hits.apply(lambda x: len(x.split(', '))) return df.sort_values(by='num_of_hits', ascending=False) num_of_hits(df)

Python
View
generate functionWed, 21 Dec 2022

Create a function that finds the key with the maximum value in 'Value' and displays it in key - value format: {'AUD': {'ID': 'R01010', {'NumCode': '036', {'CharCode': 'AUD', {'Nominal': 1, 'Name': 'Australian Dollar', 'Value': '46.9983, 'Previous': 45.9496}, 'AZN': {'ID': 'R01020A', 'NumCode': '944', 'CharCode': 'AZN', 'Nominal': 1, 'Name': 'AZN', Value: 41.4856, Previous': 40.5904}, 'GBP': {'ID': 'R01035', 'NumCode': '826', 'CharCode': 'GBP', 'Nominal': 1, 'Name': 'Pound Sterling United Kingdom', 'Value': 85.611, 'Previous': 83.7015},

def find_max(dictionary): """ Dictionary -> String :param dictionary: example {'AUD': {'ID': 'R01010', 'NumCode': '036', 'CharCode': 'AUD', 'Nominal': 1, 'Name': 'Australian Dollar', 'Value': '46.9983, 'Previous': 45.9496}, 'AZN': {'ID': 'R01020A', 'NumCode': '944', 'CharCode': 'AZN', 'Nominal': 1, 'Name': 'AZN', 'Value': 41.4856, 'Previous': 40.5904}, 'GBP': {'ID': 'R01035', 'NumCode': '826', 'CharCode': 'GBP', 'Nominal': 1, 'Name': 'Pound Sterling United Kingdom', 'Value': 85.

Python
View
generate functionSun, 11 Dec 2022

Change the code in this program so that the value of 'quantity' is multiplied by the variable 'portions' and the result is displayed: cook_book = { 'пицца': [ {'ingridient_name': 'сыр', 'quantity': 20, 'measure': 'гр'}, {'ingridient_name': 'колбаса', 'quantity': 30, 'measure': 'гр'}, {'ingridient_name': 'бекон', 'quantity': 30, 'measure': 'гр'}, {'ingridient_name': 'оливки', 'quantity': 10, 'measure': 'гр'}, ], 'лимонад': [ {'ingridient_name': 'лимон', 'quantity': 1, 'measure': 'шт'}, {'ingridient_name': 'вода', 'quantity': 200, 'measure': 'мл'}, {'ingridient_name': 'сахар', 'quantity': 10, 'measure': 'гр'}, {'ingridient_name': 'лайм', 'quantity': 20, 'measure': 'гр'}, ] } dish = input('Выберите блюдо: cалат, пицца, лимонад ') portions = int(input('Введите количество порций: ')) ingredients = cook_book[dish] for i in ingredients: for value in i.values(): print(value)

def dish(dish, portions): cook_book = { 'пицца': [ {'ingridient_name': 'сыр', 'quantity': 20, 'measure': 'гр'}, {'ingridient_name': 'колбаса', 'quantity': 30, 'measure': 'гр'}, {'ingridient_name': 'бекон', 'quantity': 30, 'measure': 'гр'}, {'ingridient_name': 'оливки', 'quantity': 10, 'measure': 'гр'}, ], 'лимонад': [ {'ingridient_name': 'лимон', 'quantity': 1, 'measure': 'шт'}, {'ingridient_name': 'вода', 'quantity': 200, 'measure': 'мл'}, {'ing

Python
View
generate functionSat, 18 Feb 2023

There are 2 charts in Python: df.plot(kind='scatter', x='User_Score', y='Global_Sales', plt.ylim(0, 40) df.plot(kind='scatter', x='Critic_Score', y='Global_Sales', plt.ylim(0, 40) Rewrite the code so that the two charts are not separate but are subplots.

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10,5)) ax[0].plot(df[['User_Score', 'Global_Sales']], df['Global_Sales'].max()) ax[1].plot(df[['Critic_Score', 'Global_Sales']], df['Global_Sales'].max())

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write code that shows the best decade with a maximum rating of 5.0

import pandas as pd df = pd.DataFrame( {'id': [4728, 35638, 21445, 40291, 29462, 46978, 42931, 38670, 25506, 43989], 'title': ['Bad Education (2019)', 'Palooka (1934)', 'High Moon (2019)', 'Saint Maud (2019)', 'Mad at the Moon (1992)', 'The Butterfly Ball (1977)', 'Snowboarďáci (2004)', 'Recon 2020: The Caprini Massacre (2004)', 'Karagoez catalogo 9,5 (1983)', 'Starting a Skyscraper (1902)'], 'rating': [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5], 'decade': ['2010-2020', '1934', '2010-2020', '2010-2020', '1990-2000', '1970-1980', '

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding values of this column are written through the "," sign into a new column hits

def group(df): return df.groupby(['performer'])['song'].apply(', '.join).reset_index()

Python
View
generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} if key == 'salad': grams = 'quantity' * portions print(grams) The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g

def multiply(dictionary): for i in dictionary[key]: i['quantity'] = i['quantity'] * portions return dictionary multiply(cook_book)

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

year_leaders = df.groupby('chart_debut').num_of_hits.transform(max) == df['num_of_hits'] df[year_leaders][['performer', 'num_of_hits']]

Python
View
generate functionTue, 21 Mar 2023

Rewrite this code to calculate statistics by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

import scipy.stats as stats def mann_whitney_test(data1, data2, alpha=0.05): stat, p = stats.mannwhitneyu(data1, data2) print('Statistics=%.3f, p=%.3f' % (stat, p)) if p > alpha: print('Same distribution (fail to reject H0)') else: print('Different distribution (reject H0)') mann_whitney_test(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds'])

Python
View
generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that 1) create a new dataframe hot_years 2) group data from df_ru by "year" column and take average values in "av_temp" column 3) leave in the hot_years dataframe only those lines where av_temp > 15

import pandas as pd df_ru = pd.read_csv('../data/ft_ru.csv', parse_dates=[0], index_col=[0], dayfirst=True) def hot_years(df): df['year'] = df.index.year hot_years = df.resample('A').mean() hot_years = hot_years[hot_years['av_temp'] > 15] return hot_years hot_years(df_ru)

Python
View
generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

import matplotlib.pyplot as plt import numpy as np def plot(A, B): plt.boxplot([A, B]) plt.show() plt.hist([A, B]) plt.show() A = [1, 2, 3] B = [4, 3, 2] plot(A, B)

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values in these two lists and replaces the second, third and all other subsequent values with replacing_num stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace(stnum, students, replacing_num): result = students.copy() for elem in students: if elem[0] == stnum: for i in range(1, len(elem)): elem[i] = replacing_num return result stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 in this dataframe the names of films are not unique, because the same film may have been given different ratings Your task is to write a function that calculates the average rating of each movie and removes non-unique strings

def avgrating(df): #your code here return df

Python
View
generate functionSun, 18 Dec 2022

Write a function that will check if two given characters are the same case. If either of the characters is not a letter, return -1 If both characters are the same case, return 1 If both characters are letters, but not the same case, return 0 Examples 'a' and 'g' returns 1 'A' and 'C' returns 1 'b' and 'G' returns 0 'B' and 'g' returns 0 '0' and '?' returns -1

def same_case(a, b): if a.isalpha() and b.isalpha(): if a.islower() and b.islower(): return 1 elif a.isupper() and b.isupper(): return 1 else: return 0 else: return -1 same_case('a', 'g') # 1 same_case('A', 'C') # 1 same_case('b', 'G') # 0 same_case('B', 'g') # 0 same_case('0', '?') # -1

Python
View
generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that 1) create a new dataframe hot_years 2) group data from df_ru by "year" column and take average values in "av_temp" column 3) leave in the hot_years dataframe only those lines where av_temp > 15

def hot_years(df): hot_years=df.groupby('year').mean() hot_years=hot_years.loc[hot_years.av_temp>15] return hot_years hot_years(df_ru)

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

def duplicates(df): return df[df.name.duplicated(keep=False)].sort_values('name') duplicates(df)

Python
View
generate functionThu, 15 Dec 2022

Create a function that outputs directory numbers (dictionary keys) by the selected value. directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } For example: input = 2207 876234 output = 1 input = 11-2 output = 1 input = 10006 output = 2 input = 1434 output = 'Nothing found'

def search_goods(): global directories input_number = input('Input directory number: ') for key, value in directories.items(): for item in value: if input_number == item: return key return 'Nothing found'

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country 1 1743-12-01 NaN NaN Åland 2 1744-01-01 NaN NaN Åland 3 1744-02-01 NaN NaN Åland 4 1744-03-01 NaN NaN Åland 9 1744-08-01 NaN NaN Åland Create a new column year from date in which you convert all date values to years. Example: 1744-01-01 Date column: 1744

def convert_to_year(x): try: date = pd.to_datetime(x) return date.year except: return None df['year'] = df.date.apply(convert_to_year)

Python
View
generate functionTue, 21 Mar 2023

Rewrite this code to calculate Wilcoxon statistics stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

from scipy.stats import mannwhitneyu def wilcoxon(a, b): stat, p = mannwhitneyu(data[data['version'] == a]['sum_gamerounds'], data[data['version'] == b]['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p)) wilcoxon('gate_30','gate_40')

Python
View
generate functionMon, 26 Dec 2022

A matrix matrix is given. matrix = [ [0,1,2,4,8], [6,2,2,1,9], [3,3,3,3,3], [4,6,7,1,2], [5,7,3,4,0] ] Write code (in pure Python, no third-party libraries, no list comprehensions) that will transpose it (that is, make columns out of rows) into matrix_t: matrix_t = [ [0, 6, 3, 4, 5], [1, 2, 3, 6, 7], [2, 2, 3, 7, 3], [4, 1, 3, 1, 4], [8, 9, 3, 2, 0] ]

def transpose(matrix): ''' Transpose a matrix ''' matrix_t = [] for i in range(len(matrix[0])): row = [] for j in range(len(matrix)): row.append(matrix[j][i]) matrix_t.append(row) return matrix_t

Python
View
generate functionSun, 11 Dec 2022

Change the code in this program so that the value of 'quantity' is multiplied by the variable 'portions' and the result is displayed: cook_book = { 'pizza': [ {'ingridient_name': 'cheese', 'quantity': 20, 'measure': 'g'}, {'ingridient_name': 'sausage', 'quantity': 30, 'measure': 'gr'}, {'ingridient_name': 'bacon', 'quantity': 30, 'measure': 'gr'} {'ingridient_name': 'olives', 'quantity': 10, 'measure': 'gr'} ]} dish = input('choose a dish: salad, pizza, lemonade ') portions = int(input('Enter number of portions: ')) ingredients = cook_book[dish] for i in ingredients: for value in i.values(): print(value)

def change_quantity(ingridient, portions): return ingridient['quantity'] * portions change_quantity(ingridient, portions)

Python
View
generate functionWed, 22 Feb 2023

Modify this code to make a horizontal bar graph instead of a vertical (plotly.express library) question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = go.Figure(data=[go.Bar(x=label, y=counts)]) fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

import plotly.express as px def horizontal_bar(question, title): question = df[question].value_counts() label = question.index counts = question.values fig = px.bar(x=label, y=counts, orientation='h') fig.update_layout(title_text=title) fig.show() horizontal_bar('What is the most preferred working environment for you?', 'Какая рабочая среда для вас наиболее предпочтительна?')

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have rated 100 or more in the rating column

import pandas as pd r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp'] ratings = pd.read_csv('u.data', sep='\t', names=r_cols,encoding='latin-1') ratings = ratings.drop('unix_timestamp', axis=1) ratings.head()

Python
View
generate functionSat, 25 Feb 2023

This code gives an error "expected string or bytes-like object" Refactor it: import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

Python
View
generate functionThu, 15 Dec 2022

Refine the function below to handle errors 1) if shelf variable is not found in directories, it will show ERROR NO SUCH KEY 2) if doc variable is not found in directories, it will show ERROR NO SUCH VALUE def change_shelf(data): docnum = '11-2' shelf = '3' if list(docnum) in data.values(): print('Error') for key, val in data.items(): for doc in val: if doc == docnum: data[shelf].append(doc) data[key].remove(doc) print('OK') return data

def change_shelf(data): docnum = '11-2' shelf = '3' if list(docnum) in data.values(): print('Error') for key, val in data.items(): for doc in val: if doc == docnum: data[shelf].append(doc) data[key].remove(doc) print('OK') return data

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

from scipy import stats from math import sqrt def interval(n, mean, sig, conf): h = sig * stats.norm.ppf((1 + conf) / 2) / sqrt(n) return int(2 * h)

Python
View
generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

import matplotlib.pyplot as plt def linegraph(df): plt.plot(df.index, df['2015'], label = "2015") plt.plot(df.index, df['2016'], label = "2016") plt.plot(df.index, df['2017'], label = "2017") plt.plot(df.index, df['2018'], label = "2018") plt.plot(df.index, df['2019'], label = "2019") plt.legend() plt.show() linegraph(df)

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the "timestamp" column for a given "userId".

import pandas as pd import datetime import numpy as np df_ratings = pd.read_csv('C:/Users/User/Downloads/ml-latest-small/ratings.csv') def diff_pd(x): return x.max() - x.min() df_ratings.groupby('userId')['timestamp'].agg([diff_pd]).mean()

Python
View
generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def rock(dataframe): for i in range(0, len(dataframe)): if(dataframe['Class 1'][i] == 'Rock' or dataframe['Class 2'][i] == 'Rock'): dataframe.drop(i, inplace = True) return dataframe rock(df)

Python
View
generate functionWed, 22 Mar 2023

Write code that correctly compares two different dataframes: p['Speed']] boosted_p['Speed'] using the stats.f_oneway method in Python

import pandas as pd from scipy import stats df1 = pd.read_csv(url, sep='\s+', skiprows=3, nrows=4) df2 = pd.read_csv(url, sep='\s+', skiprows=7, nrows=4) df1.columns = ['A', 'B', 'C'] df2.columns = ['A', 'B', 'C'] print(stats.f_oneway(df1['A'], df2['A'])) print(stats.f_oneway(df1['B'], df2['B'])) print(stats.f_oneway(df1['C'], df2['C']))

Python
View
generate functionThu, 15 Dec 2022

Create a function that allows you to move values between directories keys with nested lists inside. It should check for: 1) if the specified key is in the dictionary and show ERROR NO SUCH KEY if it does not exist 2) if the value is present in the dictionary and show ERROR NO SUCH VALUE if it doesn't exist 3) if both the key and the value are in the dictionary, the function should move the value v to another k and return a new dictionary. Note that dictionary values are lists! Example: Input: directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } v = '11-2' k = '3' Output: directories = { '1': ['2207 876234'], '2': ['10006'], '3': ['11-2'] }

def move(directories, str1, str2): # write your code here dic = directories if str1 not in dic: return 'ERROR NO SUCH KEY' if str2 not in dic: dic[str2] = [] if str1 in dic: if str2 not in dic[str1]: return 'ERROR NO SUCH VALUE' dic[str2].append(str2) for i in range(len(dic[str1])): if str2 in dic[str1][i]: dic[str1].remove(str2) return dic print(move(directories, '11-2', '3'))

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 450 (Can't Live Without Your) Love And Affection Nelson 1990-07-07 1 93 15 14.0 607 (Everything I Do) I Do It For You Bryan Adams 1991-06-29 1 53 9 8.0 748 (Hey Won't You Play) Another Somebody Done Som... B.J. Thomas 1975-02-01 1 99 17 16.0 852 (I Can't Get No) Satisfaction The Rolling Stones 1965-06-12 1 67 13 12.0 951 (I Just) Died In Your Arms Cutting Crew 1987-03-07 1 80 14 13.0 Create a new dfs dataframe, where the data format of the chart_debut column is changed from 1991-06-29 to 1991

import pandas as pd df = pd.read_csv('data/charts.csv') dfs = df.copy() def chart_debut_format(chart_debut): return chart_debut[:4] dfs['chart_debut'] = dfs['chart_debut'].apply(chart_debut_format) dfs.head()

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 convert the id column from float format to int

df_new.id = df_new.id.astype(int)

Python
View
generate functionTue, 20 Dec 2022

Change the date_range function to return a list of all dates between start and end in list def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return [] Examples: Input: date_range('2022-12-01', '2022-12-05') Output: ['2022-12-01', '2022-12-02', '2022-12-03', '2022-12-04', '2022-12-05']

def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return []

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe in this format: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singing in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 The Seventh Seal (1957) 5.0 9 Bridge Over the River Kwai (1957) 4.0 Write a function that will pull the year of the movie from its title and add it to the "year" column

def extract_year(title): return title.split('(')[1].split(')')[0] df['year'] = df['title'].apply(extract_year)

Python
View
generate functionTue, 13 Dec 2022

create a password verification function. The verification conditions are: the length should be bigger than 6; should contain at least one digit, but it cannot consist of just digits; if the password is longer than 9 - previous rule (about one digit), is not required. Input: A string. Output: A bool. Examples: assert is_acceptable_password("short") == False assert is_acceptable_password("short54") == True assert is_acceptable_password("muchlonger") == True assert is_acceptable_password("ashort") == False

def is_acceptable_password(password: str) -> bool: return len(password) > 6 and any(i.isdigit() for i in password) and not any(j.isdigit() for j in password[-9:])

Python
View
generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

import numpy as np A = np.array([6, 8, 8, 10, 12, 12, 12, 12, 12, 14, 14, 14, 14, 16, 16, 16, 18, 18, 18, 18, 22]) B = np.array([10, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20])

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write code that shows the best decade with a maximum rating of 5.0

# import pandas import pandas as pd # import the data data = pd.read_csv('https://s3.amazonaws.com/assets.datacamp.com/production/course_2023/datasets/imdb_1000.csv') # check the data data # check the shape of the data data.shape # drop the rows with null values data.dropna(inplace=True) # drop the rows with null values data.dropna(inplace=True) # check the shape of the data data.shape # find the best decade data.groupby('decade')['rating'].mean().sort_values(ascending=False)

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: 35929 Passage de Venus (1874) 3.027778 1874 40331 Sallie Gardner at a Gallop (1878) 2.909091 1878 4195 Athlete Swinging a Pick (1880) 2.666667 1880 8085 Buffalo Running (1883) 2.636364 1883 29860 Man Walking Around a Corner (1887) 1.750000 1887 53932 Traffic Crossing Leeds Bridge (1888) 2.375000 1888 36445 Pferd und Reiter Springen Über ein Hindernis (... 2.583333 1888 1778 Accordion Player (1888) 1.928571 1888 39900 Roundhay Garden Scene (1888) 2.605263 1888 Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def decade(x): if x.isdigit(): decade = int(x) decade = decade/10 decade = int(decade) decade = decade*10 return str(decade) + "-" + str(decade+10) else: return float('NaN') df['Decade of Release'] = df['Year'].apply(decade) df

Python
View
generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp)

Python
View
generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

from plotly.subplots import make_subplots fig = make_subplots(rows=1, cols=2) fig.add_trace(px.bar(x=label, y=counts, orientation='v'), 1, 1) fig.add_trace(px.bar(x=label, y=counts, orientation='v'), 1, 2) fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?', showlegend=False) fig.show()

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

def year_leaders(df): df2 = df[["performer", "hits"]].groupby(df['chart_debut']).max() return df2

Python
View
generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp)

Python
View
generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy Add it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) print(dfp_copy) return dfp_copy

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists. If a match is found, only the 2nd value must be replaced by replacing_num stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace_numbers(stnum, stus, repnum): for num, stu in enumerate(stus): if stu[0] == stnum: stu[0] = repnum return stus replace_numbers('4004', students, '9090')

Python
View
generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

def checkio(delta, sigsqr, conf): # Your code here # It's main function. Don't remove this function # It's using for auto-testing and must return a result for check. import scipy.stats as st import numpy as np n = ((st.norm.ppf((1+conf)/2))**2*sigsqr)/(delta**2) return int(n) # Some hints # You can use stats.norm.ppf function for the normal distribution #These "asserts" using only for self-checking and not necessary for auto-testing if __name__ == '__main__': assert checkio(0.02, 0.04, 0.95) == 491 assert checkio(0.02, 0.005, 0.95) == 4127 print("Coding complete? Click 'Check' to earn cool rewards!")

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

def year_leaders(df): pass

Python
View
generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! Example: dish = 'salad' portions = 5 The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]}

def total_ingridients(): dish = 'salad' portions = 5 cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} print(f'{dish}') for ing in cook_book[dish]: print(f'{ing["ingridient_name"]}: {ing["quantity"]*portions}{ing["measure"]}')

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

def year_leaders(df): return df.groupby(['chart_debut'])['num_of_hits'].max()

Python
View
generate functionTue, 13 Dec 2022

create a password verification function. The verification conditions are: the length should be bigger than 6; should contain at least one digit, but it cannot consist of just digits; if the password is longer than 9 - previous rule (about one digit), is not required. Input: A string. Output: A bool. Examples: assert is_acceptable_password("short") == False assert is_acceptable_password("short54") == True assert is_acceptable_password("muchlonger") == True assert is_acceptable_password("ashort") == False

def is_acceptable_password(password: str) -> bool: if len(password) > 6: if len(password) > 9: return True if password.isdigit(): return False else: return True else: return False

Python
View
generate functionThu, 30 Mar 2023

There are two vectors: film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) We know that the angle between them is 65.9 degrees. Construct a graph with a visual representation of the vectors and the angle between them.

import numpy as np import matplotlib.pyplot as plt import matplotlib.patches as patches fig = plt.figure() ax = fig.add_subplot(111,aspect='equal') vec_1 = np.array([1,1]) vec_2 = np.array([1,0]) x_coords = [0, vec_1[0], vec_2[0]] y_coords = [0, vec_1[1], vec_2[1]] plt.plot(x_coords, y_coords, color='black') plt.arrow(0, 0, vec_1[0], vec_1[1], head_width=0.1, head_length=0.1, color='black') plt.arrow(0, 0, vec_2[0], vec_2[1], head_width=0.1, head_length=0.1, color='black') angle = np.arccos(np.dot(vec_1,vec_2)

Python
View
generate functionMon, 20 Feb 2023

There is a Pandas code: df2.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df2.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') Write a function that combines these two actions into one and returns the changed dataframe

def highlight(df): return df.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') return df.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') #your code here

Python
View
generate functionThu, 15 Dec 2022

Create a function that adds the following variables docnum = '42' doctype = 'multipassport' docowner = 'R2D2' shelf = 3 into lists and dictionaries with the corresponding structures documents = [ {'type': doctype, 'number': docnum, 'name': docowner}, ] directories = { shelf: [] }

def new_doc_add(): docnum = input('Enter the number of your document: ') doctype = input('Enter the type of your document: ') docowner = input('Enter the owner of your document: ') shelf = input('Enter the shelf number: ') documents.append({'type': doctype, 'number': docnum, 'name': docowner}) if shelf in directories: directories[shelf].append(docnum) else: directories[shelf] = [docnum]

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Pandas dataframe is available: song artist chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding (unique) values to that column are written through the "," sign into a new hits column. All values of both time_on_chart and consecutive_weeks columns must be summed if the performer value matches.

song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks hits Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 Stupid Cupid Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 Chantilly Lace Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 Chantilly Lace Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 Chantilly Lace Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Chantilly Lace

Python
View
generate functionTue, 21 Mar 2023

Calculate mean, median values of this list water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1]

def mean(numbers): total_sum = 0; for n in numbers: total_sum += n count = len(numbers) avg = total_sum / count return avg def median(numbers): numbers.sort() #sort the list count = len(numbers) #get the length of the list isEven = count % 2 == 0 #check if this list is of even length if (isEven): #find the two numbers in the middle of the list mid = math.floor( count / 2 ) a = numbers[mid - 1] b = numbers[mid] #find the average of these two numbers ans = mean([a, b]) else: ans = numbers[math.floor( count / 2 )] return ans

Python
View
generate functionSat, 18 Feb 2023

regions = ['North America', 'Europe', 'Japan', 'Other'] na_sales = df['NA_Sales'].sum() eu_sales = df['EU_Sales'].sum() jp_sales = df['JP_Sales'].sum() oth_sales = df['Other_Sales'].sum() Write a function that create a pd.DataFrame with this data. Example output: North America 1111 Europe 1234 Japan 3456 Other 5678

def create_sales_df(df): return pd.DataFrame({'regions': ['North America', 'Europe', 'Japan', 'Other'], 'sales': [df['NA_Sales'].sum(), df['EU_Sales'].sum(), df['JP_Sales'].sum(), df['Other_Sales'].sum()]}) create_sales_df(df)

Python
View
generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def delete_row_with_rock(df): # your code here

Python
View
generate functionSun, 18 Dec 2022

Rewrite the code with a command that removes unnecessary quotation marks (' ') in the dictionary: Here's the problem: Output: {' "user_id"': ' "category"}', ' "1840e0b9d4"': ' "Products"}', How it should be: Output: {"user_id": "category"}, "1840e0b9d4": "Products"}, The code: f = open('purchase_log.txt', encoding='utf-8') import re import json purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

import re def remove_quotes(d): for k, v in d.items(): k = re.sub(r" \"", "", k) v = re.sub(r" \"", "", v) d[k] = v return d remove_quotes({' "user_id"': ' "category"}', ' "1840e0b9d4"': ' "Products"}'})

Python
View
generate functionTue, 20 Dec 2022

Change the date_range function to count the difference between end and start and then loop a list of all dates in that range def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return [] Examples: Input: date_range('2022-12-01', '2022-12-05') Output: ['2022-12-01', '2022-12-02', '2022-12-03', '2022-12-04', '2022-12-05']

from datetime import date, timedelta as td def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) while start <= end: lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return []

Python
View
generate functionSat, 28 Jan 2023

There is a Pandas dataframe: date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 2 2021-07-01 Yalta Skyline 15 12 38016 11655 8775 17586 3 2021-07-01 Alushta Breeze 36 25 75475 27972 21060 26443 4 2021-07-01 Alushta Moreon 20 17 51091 15540 11700 23851 5 2021-07-01 Gurzuf Alpina 14 12 38736 10878 8190 19668 Write a function divide_hotels that creates new columns big_hotels, medium_hotels, small_hotels, and adds values according to the following conditions: 1) if df['total_rooms'] > 30, then profit value is substituted into big_hotels column 2) if df['total_rooms'] > 20, then the profit value is substituted in the medium_hotels column 3) if df['total_rooms'] > 10, then profit value is substituted for small_hotels column

def divide_hotels(df): df['big_hotels'] = df.apply(lambda x: x['profit'] if x['total_rooms'] > 30 else 0, axis=1) df['medium_hotels'] = df.apply(lambda x: x['profit'] if x['total_rooms'] <= 30 and x['total_rooms'] > 20 else 0, axis=1) df['small_hotels'] = df.apply(lambda x: x['profit'] if x['total_rooms'] <= 20 and x['total_rooms'] > 10 else 0, axis=1) return df df = divide_hotels(df) df.head()

Python
View
generate functionWed, 18 Jan 2023

there is a dictionary: geo_data = {'Center': ['Moscow', 'Tula', 'Yaroslavl'], {'Northwest': ['Petersburg', 'Pskov', 'Murmansk'], 'Far East': ['Vladivostok', 'Sakhalin', 'Khabarovsk']} Write a function that will search the strings for the value of the dictionary, and if it exists, it will return the key of the dictionary. For example: geo_class('Pskov') Result: 'Northwest'

geo_data = {'Center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['Petersburg', 'Pskov', 'Murmansk'], 'Far East': ['Vladivostok', 'Sakhalin', 'Khabarovsk']} def geo_class(city): for region in geo_data: if city in geo_data[region]: return region geo_class('Pskov')

Python
View
generate functionMon, 13 Mar 2023

There is a logistic linear discriminant model trained using the following formula: from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(data, data_class, random_state=42) from sklearn.discriminant_analysis import LinearDiscriminantAnalysis lda.fit(X_train, y_train) Write code to evaluate its quality with F1 measure

def my_evaluation(x_train, y_train, x_test, y_test): # your code return f1, precision, recall

Python
View
generate functionTue, 24 Jan 2023

there is a Pandas dataframe: timestamp visit_id url region user_id traffic_source 0 1549980692 e3b0c44298 https://host.ru/3c19b4ef7371864fa3 Russia b1613cc09f yandex 1 1549980704 6e340b9cff https://host.ru/c8d9213a31839f9a3a Germany 4c3ec14bee direct 2 1549980715 96a296d224 https://host.ru/b8b58337d272ee7b15 USA a8c40697fb yandex 3 1549980725 709e80c884 https://host.ru/b8b58337d272ee7b15 Italy 521ac1d6a0 yandex 4 1549980736 df3f619804 https://host.ru/b8b58337d272ee7b15 Russia d7323c571c yandex Create a new dataframe summary in which count how many traffic_source values relate to each region

def count_sources_per_region(df): return df.groupby(['region'])['traffic_source'].count()

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists. If a match is found, only the 2nd value must be replaced by replacing_num stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace_num(stnums, students, replacing_num): for num in stnums: for st in students: if st[0] == num: st[1] = replacing_num return students replace_num(stnums, students, replacing_num)

Python
View
generate functionMon, 26 Dec 2022

A matrix matrix is given. matrix = [ [0,1,2,4,8], [6,2,2,1,9], [3,3,3,3,3], [4,6,7,1,2], [5,7,3,4,0] ] Write code (in pure Python, no third-party libraries) that will transpose it (that is, make columns out of rows) into matrix_t: matrix_t = [ [0, 6, 3, 4, 5], [1, 2, 3, 6, 7], [2, 2, 3, 7, 3], [4, 1, 3, 1, 4], [8, 9, 3, 2, 0] ]

def transpose_matrix(A): rows = len(A) cols = len(A[0]) B = [[0 for i in range(0, rows)] for j in range(0, cols)] for i in range(0, rows): for j in range(0, cols): B[j][i] = A[i][j] return B

Python
View
generate functionWed, 21 Dec 2022

Change the class so that when self.seniority is added, the designer is assigned 1 point and when self.intlawards is added, the designer is assigned 2 points. Correct the condition so that these points add up correctly: if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() Example class: class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

Python
View
generate functionMon, 27 Mar 2023

There is a dataframe df with two columns: Category and Message in the following format: Category Message 0 ham Go until jurong point, crazy... Available only ... 1 ham Ok lar... Joking wif u oni... 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 3 ham U dun say so early hor... U c already then say... 4 ham Nah I don't think he goes to usf, he lives aro... Split it into samples X = df['Message'] y = df['Category'] so that there are float numbers instead of object text everywhere.

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Python
View
generate functionSat, 28 Jan 2023

Construct 4 seaborn bar graphs from these data: 1) df1['city'].value_counts() 2) df2['hotel'].value_counts() 3) df2['date'].value_counts() 4) df2['rating'].value_counts()

def bar_graphs(df1, df2): plt.figure(figsize=(15,7)) plt.subplot(2,2,1) sns.barplot(df1['city'].value_counts().index, df1['city'].value_counts().values) plt.xticks(rotation=40) plt.title('City') plt.subplot(2,2,2) sns.barplot(df2['hotel'].value_counts().index, df2['hotel'].value_counts().values) plt.xticks(rotation=40) plt.title('Hotel') plt.subplot(2,2,3) sns.barplot(df2['date'].value_counts().index, df2['date'].value_counts().values) plt.xticks(rotation=40) plt.title('Date') plt

Python
View
generate functionTue, 21 Mar 2023

Rewrite this code to calculate stats instead of mannwhitneyu by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

from scipy.stats import wilcoxon stat, p = wilcoxon(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a function year_leaders that will build a new dataframe and leave 1 line of performer and hits (having a maximum number of num_of_hits) in each chart_debut value

def year_leaders(df): df=df.groupby(df.chart_debut).apply(lambda x: x.sort_values('num_of_hits',ascending=False)).reset_index(drop=True) df=df.groupby('chart_debut').head(1) return df year_leaders(df)

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

# Добавляем колонку для проверки df['duplicate_name'] = df['name'].duplicated() # Проверяем колонку и выводим на экран df [df['duplicate_name'] == True]

Python
View
generate functionMon, 27 Mar 2023

There is a dataframe df with two columns: Category and Message in the following format: Category Message 0 ham Go until jurong point, crazy... Available only ... 1 ham Ok lar... Joking wif u oni... 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 3 ham U dun say so early hor... U c already then say... 4 ham Nah I don't think he goes to usf, he lives aro... Split it into samples X = df['Message'] y = df['Category'] so that there are float numbers instead of object text everywhere.

def split_dataframe(df): X = df['Message'] y = df['Category'] return X, y split_dataframe(df)

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times Write a function that groups unique movie titles in one column and displays their average rating in the second column

def group_by_title(df): return df.groupby('title')['rating'].mean()

Python
View
generate functionWed, 18 Jan 2023

Fix this function so that when it finds a value in a Pandas dataframe string, it returns its key: def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data.values(): if value in row: return key return 'undefined'

def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data.values(): if value in row: return key return 'undefined'

Python
View
generate functionSat, 25 Feb 2023

There is a Pandas dataframe: id grade rate_group 0 1077501 B 10-11% 1 1077430 C 15-17% 2 1077175 C 15-17% 3 1076863 C 13-14% 4 1075358 B 12-13% 5 1075269 A 7-8% 6 1069639 C 15-17% 7 1072053 E 17-25% 8 1071795 F 17-25% 9 1071570 B 12-13% Modify the dataframe so that grades are in columns and rate_group in rows.

df = pd.DataFrame({'grade': ['A','B','C','D','E','F'], 'rate_group': ['7-8%', '10-11%', '12-13%', '15-17%', '17-25%', '17-25%', '17-25%'], 'id': [1077501, 1077430, 1077175, 1076863, 1075358, 1075269, 1069639, 1072053, 1071795, 1071570]}) df.set_index(['id', 'rate_group'], inplace=True) df.unstack(level='rate_group')

Python
View
generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! Example of program operation: portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} if key == 'salad': grams = 'quantity' * portions print(grams) The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g

def get_quantity(cook_book, key, portions): grams = 0 if key == 'salad': grams = cook_book[key][0]['quantity'] * portions print(f"{cook_book[key][0]['ingridient_name']}: {grams} {cook_book[key][0]['measure']}") grams = cook_book[key][1]['quantity'] * portions print(f"{cook_book[key][1]['ingridient_name']}: {grams} {cook_book[key][1]['measure']}") grams = cook_book[key][2]['quantity'] * portions print(f"{cook_book[key][2]['ingridient_name']}: {grams} {cook_book[key][2]['measure']}")

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

import pandas as pd df = pd.DataFrame({ 'user_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'name': ['Ksenia Rodionova', 'Ulyana Selezneva', 'Konstantin Prokhorov', 'Petrov Vladimir', 'Arina Selivanova', 'Svetlana Kuznecova', 'Evgeniy Laptev', 'Ivan Ryzhkov', 'Sidorov Nikolay', 'Nikolay Ivanov', 'Natalya Volkova', 'Maksim Petrov', 'Maksim Petrov', 'Viktor Fomichev', 'Ulyana Selezneva'], 'date': ['2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-02', '2021-07-02', '2021-07-02', '

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe in this format: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singing in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 The Seventh Seal (1957) 5.0 9 Bridge Over the River Kwai (1957) 4.0 Write a function that will pull the year of the movie from its title and add it to the "year" column

def get_year(title): return int(title.split()[-1][1:-1]) df['year'] = df['title'].apply(get_year)

Python
View
generate functionMon, 27 Mar 2023

There is a variable tfidf that contains numbers in this format: [(0, -0.34657359027997264), (1, -0.34657359027997264), (2, -0.34657359027997264), (3, -0.34657359027997264), (4, -0.34657359027997264), (5, -1.0986122886681098)...] When loading the cosine similarities code from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows]) The following error appears: TypeError: unhashable type: 'list'. Fix this error

import numpy as np tfidf = np.array(tfidf) from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows])

Python
View
generate functionMon, 12 Dec 2022

Create a program that will take two lists of integers, a and b. Each list will consist of 3 positive integers above 0, representing the dimensions of cuboids a and b. You must find the difference of the cuboids' volumes regardless of which is bigger. For example, if the parameters passed are ([2, 2, 3], [5, 4, 1]), the volume of a is 12 and the volume of b is 20. Therefore, the function should return 8. Write solution in one line of code.

def find_difference(a, b): return abs(reduce(lambda x, y: x*y, a) - reduce(lambda x, y: x*y, b))

Python
View
generate functionWed, 21 Dec 2022

Change this function so that the employee's awards are also taken into account (add this variable to the class) and the employee's promotion condition is taken into account by the formula if (self.seniority)+(self.intlawards*2) % 7 == 0: self.grade_up() class Developer(Employee): def __init__(self, name, seniority): super().__init__(name, seniority) def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all of the developers pass the accreditation self.seniority += 1 # condition of promoting an employee from the presentation if self.seniority % 5 == 0: self.grade_up() # publication of the results return self.publish_grade()

Python
View
generate functionSat, 04 Mar 2023

Correct the code to correctly output the dataframe with this data: gdpdiff = pd.DataFrame({'Top1': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) gdpdiff

def top20(df): gdpdiff = pd.DataFrame({'Top1': df[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) return gdpdiff top20(df19)

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 450 (Can't Live Without Your) Love And Affection Nelson 1990-07-07 1 93 15 14.0 607 (Everything I Do) I Do It For You Bryan Adams 1991-06-29 1 53 9 8.0 748 (Hey Won't You Play) Another Somebody Done Som... B.J. Thomas 1975-02-01 1 99 17 16.0 852 (I Can't Get No) Satisfaction The Rolling Stones 1965-06-12 1 67 13 12.0 951 (I Just) Died In Your Arms Cutting Crew 1987-03-07 1 80 14 13.0 Create a new dfs dataframe, where the data format of the chart_debut column is changed from 1991-06-29 to 1991

def data_clean(df): df['chart_debut'] = df['chart_debut'].str.split('-').str[0] return df data_clean(df)

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the timestamp column for a given userId.

import pandas as pd ratings = pd.read_csv('ratings.csv') def aver_lifetime(data): data['date'] = pd.to_datetime(data['timestamp'], unit='s').dt.date lifetimes = data.groupby('userId')['date'].agg(['min', 'max']) lifetimes['lifetime'] = lifetimes['max'] - lifetimes['min'] return lifetimes['lifetime'].mean() aver_lifetime(ratings)

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: 35929 Passage de Venus (1874) 3.027778 1874 40331 Sallie Gardner at a Gallop (1878) 2.909091 1878 4195 Athlete Swinging a Pick (1880) 2.666667 1880 8085 Buffalo Running (1883) 2.636364 1883 29860 Man Walking Around a Corner (1887) 1.750000 1887 53932 Traffic Crossing Leeds Bridge (1888) 2.375000 1888 36445 Pferd und Reiter Springen Über ein Hindernis (... 2.583333 1888 1778 Accordion Player (1888) 1.928571 1888 39900 Roundhay Garden Scene (1888) 2.605263 1888 Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def assignDecade(x): if type(x) is str: return np.nan else: return (str(int(x[:3]))+"0-") + (str(int(x[:3])+1)+"0") df['Decade of Release'] = df['Year'].map(assignDecade) df

Python
View
generate functionMon, 20 Feb 2023

There is a Pandas code: df2.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df2.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') Write a function that combines these two actions into one and returns the changed dataframe

def highlight_min_max(df1): return df1.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;').apply(highlight_min, axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') highlight_min_max(df2)

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists and replaces them with replacing_num (but without touching the first one) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace_nums(students, stnums, replacing_num): for st in students: for stnum in stnums: if stnum in st: st.remove(stnum) st.append(replacing_num) return students print(replace_nums(students, stnums, replacing_num))

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

def interval(n, mean, sig, conf): h = sig * stats.norm.ppf((1 + conf) / 2) / (n ** 0.5) return int(round(h * 2))

Python
View
generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = 'quantity' * portions print(grams) The result should be: Cheese: 210 g Tomatoes: 6 pcs. Cucumbers: 60g Olives: 30 g Olive oil: 60 ml Lettuce: 30 gr Pepper: 60 gr

def get_shop_list_by_dishes(dishes, person_count): cook_book = { 'salad': [ {'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'gr'}], 'cucumbers': [ {'ingridient_name': 'cucumbers', 'quantity': 20, 'measure': 'gr'}, {'ingridient_name': 'pepper', 'quantity': 8, 'measure': 'gr'}, {'ingridient_name': 'olives', 'quantity': 8, 'measure': 'gr'}, {'ingridient_name': 'olive oil', 'quantity': 30, 'measure': 'ml'}], 'olives': [

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe: date hotel revenue av_revenue difference in_percent 5 2021-07-01 Alpina 38736 NaN NaN NaN 11 2021-07-02 Moreon 40766 NaN NaN NaN 17 2021-07-03 Alpina 30330 NaN NaN NaN 23 2021-07-04 Galaxia 50180 NaN NaN NaN 29 2021-07-05 Optima 27006 NaN NaN NaN 1) Write a function that calculates the average revenue for each hotel and adds these values to the "av_revenue" column 2) Write a function that calculates the difference between the average revenue and the values in the "revenue" column and adds these calculations to the "difference" column 3) Write a function that calculates the difference between the average revenue and the values in the "revenue" column as a percentage and adds those calculations to the "in_percent" column

def av_revenue(df): df['av_revenue'] = df.groupby('hotel')['revenue'].transform('mean') return df def difference(df): df['difference'] = df['av_revenue'] - df['revenue'] return df def in_percent(df): df['in_percent'] = 100 * df['difference'] / df['av_revenue'] return df

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write a function that shows which decade had the most movies with a high rating Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

import pandas as pd def decade(year): if year < 1910: return str(year - year%10) + '-' + str(year - year%10 + 10) elif year > 2009: return str(year - year%10) + '-' + str(year - year%10 + 10) else: return str(year - year%10) + '-' + str(year - year%10 + 9) df['Decade of Release'] = df['Year'].map(decade)

Python
View
generate functionWed, 25 Jan 2023

There is a list with lists inside [['/world/'], ['/latest/'], ['/?updated=top'], ['/politics/36188461-s-marta-zhizn-rossiyan-suschestvenno-izmenitsya-iz-za-novyh-zakonov/'] ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/'], ['/science/36157853-nasa-sobiraet-ekstrennuyu-press-konferentsiyu-na-temu-vnezemnoy-zhizni/'], ['/video/36001498-poyavilis-pervye-podrobnosti-gibeli-natali-melamed/'], ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/?smi2=1'] ['/science/'], ['/sport/'], ['/middleeast/36131117-divizion-s-400-ne-zametil-ataki-f-35-pod-damaskom/'], ['/economics/36065674-rossiyane-vozmutilis-minimalnymi-zarplatami-v-stranah-es/']] 1) Modify the list to the Pandas dataframe 2) Filter out and leave only the url's with the news sctructure (containing 8 digits and heading) in it, using the str.contains method

import pandas as pd list = [['/world/'], ['/latest/'], ['/?updated=top'], ['/politics/36188461-s-marta-zhizn-rossiyan-suschestvenno-izmenitsya-iz-za-novyh-zakonov/'] ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/'], ['/science/36157853-nasa-sobiraet-ekstrennuyu-press-konferentsiyu-na-temu-vnezemnoy-zhizni/'], ['/video/36001498-poyavilis-pervye-podrobnosti-gibeli-natali-melamed/'], ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/?smi2=1'] ['/science/

Python
View
generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp)

Python
View
generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

plt.title('Percentage difference between nutrient and mineral content for each vegitable') plt.xlabel('Vegetable') plt.ylabel('Percentage difference') plt.legend() plt.show()

Python
View
generate functionWed, 08 Feb 2023

def get_needed_posts(query): for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True) Modify the Python function above so that if it finds a repeated title or link, it will not add it to the return

def search(query): query = query.lower() site = pd.DataFrame() for page in range(0, 10): if page == 0: url = 'https://habr.com/ru/all/' else: url = 'https://habr.com/ru/all/page' + str(page) + '/' req = requests.get(url) soup = BeautifulSoup(req.text, 'html.parser') articles = soup.find_all('li', class_='content-list__item_post') site = pd.concat([site, get_needed_posts(query, articles)]) return site.reset_index(drop=True) def get_needed_posts(query, articles): site = pd.DataFrame() for article in articles: title, date, link = get_article_info(article) if not title in list(site['title']) and not link in list(site['link']): row = {'

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times both titles and ratings in this table are not unique, they are duplicated many times Write a function that creates a new dataframe with two columns: - In the first column it groups the unique titles of the movies - In the 2nd column displays their average rating

import pandas as pd ratings = pd.read_csv('ratings.csv') filtered_ratings = ratings[['title', 'rating']] filtered_ratings = filtered_ratings.groupby('title').mean() filtered_ratings

Python
View
generate functionTue, 10 Jan 2023

import numpy as np def sum_matrix(N): my_matrix = np.diag(np.arange(N-1, -1, -1), k=0) return np.trace(my_matrix) print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15)) Create in Numpy a diagonal matrix with elements from N to 0. Count the sum of its values on the diagonal.

import numpy as np def sum_matrix(N): my_matrix = np.diag(np.arange(N-1, -1, -1), k=0) return np.trace(my_matrix) print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15))

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

df = df[df.duplicated(subset=["name"], keep=False)].sort_values("name")

Python
View
generate functionWed, 22 Feb 2023

Modify the code below to have column captions for the x-axis values question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def create_plot(question, title, figure_template): # question = 'What is the most preferred working environment for you.' question6 = df[question].value_counts() label = question6.index counts = question6.values fig = figure_template(x=label, y=counts) fig.update_layout(title_text=title) fig.show() create_plot('What is the most preferred working environment for you.', 'Какая рабочая среда для вас наиболее предпочтительна?', px.bar)

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that shows the top 20 rows from the country column sorted by av_temp values in descending order.

def top(df, column_name): return df.sort_values(by=column_name, ascending=False)[:20] top(df, 'av_temp')

Python
View
generate functionWed, 25 Jan 2023

There is a Pandas dataframe news_title 0 /world/ 1 /latest/ 2 /?updated=top 3 /politics/36188461-s-marta-zhizn-rossiyan-susc... 4 /world/36007585-tramp-pridumal-kak-reshit-ukra... 5 /science/36157853-nasa-sobiraet-ekstrennuyu-pr... 6 /video/36001498-poyavilis-pervye-podrobnosti-g... 7 /world/36007585-tramp-pridumal-kak-reshit-ukra... 8 /science/ 9 /sport/ Filter out this dataframe and leave only the url's with the news sctructure (containing 8 digits and heading) in it, using the str.contains method

import pandas as pd df = pd.read_csv('news.csv', delimiter='\t') def filter_news(news_title): if news_title.startswith('/') and news_title.count('/')==2 and news_title[-1].isdigit() and not news_title.endswith('/'): return True else: return False df.news_title.apply(filter_news)

Python
View
generate functionTue, 21 Mar 2023

There are 3 lists: water, nutri and mineral. They differ in length: print(len(water)) - 20 print(len(nutri)) -21 print(len(mineral)) -21 add them to df Pandas dataframe in the following format: index treatments value 0 0 water 25 1 1 water 30 2 2 water 28 3 3 nutri 36 4 4 nutri 29 5 0 nutri 45 6 1 mineral 55 7 2 mineral 29 8 3 mineral 56

# function to create a new data frame, water, mineral, nutri and then concatenate them to form a new df def df_maker(df, name): df['treatments'] = name df = df.reset_index() df = df[['index', 'treatments', 'value']] return df # create water df water = df_maker(water, 'water') # create nutri df nutri = df_maker(nutri, 'nutri') # create mineral df mineral = df_maker(mineral, 'mineral') # combine df combined_df = pd.concat([water, nutri, mineral]) combined_df

Python
View
generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=label, y=counts) fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe: hotel how_find_us Alpina agg_101hotels.com 2 agg_airbnb.com 4 agg_booking.com 3 agg_bronevik.com 1 agg_hotellook.ru 4 agg_level.travel 2 agg_onetwotrip.com 2 agg_onlinetours.ru 2 agg_ostrovok.ru 1 agg_other_sites 1 agg_roomguru.ru 2 agg_sutochno.ru 6 agg_travelata.ru 3 agg_tripadvisor.ru 3 agg_trivago.com 6 agg_tvil.ru 3 agg_yandex_travel 5 by_recommendation 8 facebook_adv 1 google_adv 1 instagram_adv 6 outdoor 6 regular_customer 1 seo 5 social 3 telegram_adv 2 tour_agents 3 unknown 4 vk_adv 3 yandex_adv 10 Write a function that creates a new dataframe by grouping the values in the "how_find_us" column according to the following rules: 1) if there are "agg" in the row, replace all row values with "aggregators" 2) if there are words "facebook" or "vk" or "instagram" or "telegram" in the line, replace these values with "social"

def find_us(df): df.loc[df.hotel=='Alpina', 'how_find_us'].value_counts() df.loc[df.hotel=='Alpina', 'how_find_us'] = df.loc[df.hotel=='Alpina', 'how_find_us'].map(lambda x: 'aggregators' if 'agg' in x else x) df.loc[df.hotel=='Alpina', 'how_find_us'] = df.loc[df.hotel=='Alpina', 'how_find_us'].map(lambda x: 'social' if 'facebook' in x or 'vk' in x or 'instagram' in x or 'telegram' in x else x) return df

Python
View
generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

def func(x, pos): # formatter function takes tick label and tick position s = '{:0,d}'.format(int(x)) return s import matplotlib.ticker as ticker formatter = ticker.FuncFormatter(func) # make formatter plt.gca().yaxis.set_major_formatter(formatter) # set formatter to needed axis

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incident_types with a value >= 5. Sort them in descending order by incident_type column

df_new.groupby(['route', 'incident_type']).count()['id'] df_new.groupby(['route', 'incident_type'])['id'].count() df_new.groupby(['route', 'incident_type'])['id'].size() df_new[df_new['incident_type'] >= 5].groupby(['route', 'incident_type']).count()['id'] df_new[df_new['incident_type'] >= 5].groupby(['route', 'incident_type']).size() df_new[df_new['incident_type'] >= 5].groupby(['route', 'incident_type'])['id'].size() df_new[df_new['incident_type'] >= 5].groupby(['route', 'incident_type'])['id'].size().sort_values(ascending=False)

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write a function that shows which decade had the most movies with a high rating Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def decade_of_release(year): if type(year) == int: if year < 1900: return "1800-1900" if year >= 1900 and year < 1910: return "1900-1910" if year >= 1910 and year < 1920: return "1910-1920" if year >= 1920 and year < 1930: return "1920-1930" if year >= 1930 and year < 1940: return "1930-1940" if year >= 1940 and year < 1950: return "1940-1950" if year >= 1950 and year < 1960: return "1950-1960" if year >= 1960 and year < 1970: return "1960-1970" if year >= 1970 and year < 1980: return "1970-1980" if year >= 1980 and year < 1990: return "1980-1990" if year >= 1990 and year < 2000: return "1990-2000" if year >= 2000 and year < 2010: return "2000-2010"

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the "timestamp" column for a given "userId".

df=pd.DataFrame({'userId':[1,1,1,1,1], 'movieId':[31,1029,1061,1129,1172], 'rating':[2.5,3,3,2,4], 'timestamp':[1260759144,1260759179,1260759182,1260759185,1260759205] }) df

Python
View
generate functionThu, 30 Mar 2023

There is a matrix A1 = array([[0, 0, 0, 0, 1, 0], [0, 1, 0, 1, 0, 0], [0, 0, 1, 1, 1, 0], [1, 0, 0, 1, 0, 1]]) Multiply the vector vector = [1,2,3,4] by the matrix A1

def matrix_multiplication(matrix,vector): matrix_dot_vector = [] for i in range(len(matrix)): sum_of_row = 0 for j in range(len(matrix[0])): sum_of_row += matrix[i][j] * vector[j] matrix_dot_vector.append(sum_of_row) return matrix_dot_vector vector = [1,2,3,4] A1 = array([[0, 0, 0, 0, 1, 0], [0, 1, 0, 1, 0, 0], [0, 0, 1, 1, 1, 0], [1, 0, 0, 1, 0, 1]]) matrix_multiplication(A1,vector)

Python
View
generate functionSat, 25 Feb 2023

x1 = (round(g1['rate_group'].value_counts() / g1['rate_group'].value_counts().sum() * 100, 2)) x2 = (round(g2['rate_group'].value_counts() / g2['rate_group'].value_counts().sum() * 100, 2)) x1.plot(kind='hist', alpha=0.5, bins=6, density=True) x2.plot(kind='hist', alpha=0.5, bins=6, density=True) plt.legend(loc='upper left') plt.title('Сравнение распределений с собственным жильем и без') Add to the code above the caption of the legends on the chart: - for x1 = 'column1' - for x2 = 'column2'

# write your code here

Python
View
generate functionThu, 30 Mar 2023

There are two vectors: film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) We know that the angle between them is 65.9 degrees. Construct a graph with a visual representation of the vectors and the angle between them.

import matplotlib.pyplot as plt from math import cos, sin, radians, degrees def plot_vector(v, ax=None): ax = ax or plt.gca() arrowprops=dict(arrowstyle='->', linewidth=2, shrinkA=0, shrinkB=0) ax.annotate('', v, v+v, arrowprops=arrowprops) film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) ax = plt.axes() plot_vector(film_2, ax) plot_vector(film_4, ax) ax.set_xlim(0, 2) ax.set_ylim(0, 2) plt.show()

Python
View
generate functionWed, 21 Dec 2022

Change this function so that the employee's awards are also taken into account (add this variable to the class) and the employee's promotion condition is taken into account by the formula if (self.seniority)+(self.intlawards*2) % 7 == 0: self.grade_up() class Developer(Employee): def __init__(self, name, seniority): super().__init__(name, seniority) def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all of the developers pass the accreditation self.seniority += 1 # condition of promoting an employee from the presentation if self.seniority % 5 == 0: self.grade_up() # publication of the results return self.publish_grade()

class Managers(Employee): def __init__(self, name, seniority, awards): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all of the developers pass the accreditation self.seniority += 1 # condition of promoting an employee from the presentation if (self.seniority)+(self.intlawards*2) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade() def add(a, b): return a + b add(1, 2)

Python
View
generate functionFri, 23 Dec 2022

You are given two sorted arrays that both only contain integers. Your task is to find a way to merge them into a single one, sorted in asc order. Complete the function mergeArrays(arr1, arr2), where arr1 and arr2 are the original sorted arrays. You don't need to worry about validation, since arr1 and arr2 must be arrays with 0 or more Integers. If both arr1 and arr2 are empty, then just return an empty array. Note: arr1 and arr2 may be sorted in different orders. Also arr1 and arr2 may have same integers. Remove duplicated in the returned result. Examples (input -> output) * [1, 2, 3, 4, 5], [6, 7, 8, 9, 10] -> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * [1, 3, 5, 7, 9], [10, 8, 6, 4, 2] -> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * [1, 3, 5, 7, 9, 11, 12], [1, 2, 3, 4, 5, 10, 12] -> [1, 2, 3, 4, 5, 7, 9, 10, 11, 12]

def merge_arrays(arr1, arr2): arr1.extend(arr2) arr1.sort() new_set = set(arr1) new_list = list(new_set) return new_list

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer time_on_chart min max "Groove" Holmes 1 11 "Little" Jimmy Dickens 1 10 "Pookie" Hudson 1 1 "Weird Al" Yankovic 1 20 Sort these rows by the time_on_chart and max columns, in descending order, and display the first 20 rows

import pandas as pd df = pd.read_csv("https://www.dropbox.com/s/jr9c7rwhi8hvuk7/performers.csv?dl=1") df.sort_values(by=['time_on_chart','max'], ascending=False).head(20) performer min max time_on_chart 9 "Weird Al" Yankovic 1 20 21 0 "Groove" Holmes 1 11 12 1 "Little" Jimmy Dickens 1 10 11 2 "Pookie" Hudson 1 1 2

Python
View
generate functionWed, 21 Dec 2022

Change the class so that when self.seniority is added, the designer is assigned 1 point and when self.intlawards is added, the designer is assigned 2 points. Correct the condition so that these points add up correctly: if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() Example class: class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): self.seniority += 1 self.seniority += self.intlawards if self.seniority % 7 == 0: self.grade_up() return self.publish_grade()

Python
View
generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) calculate the correlation of this column with other columns: pulse, respiratory_rate, extremities_temp, pain_outcome 3) fill all NaNs in the 'rectal_temp' column with the average values multiplied by the correlation weight

def fill_na(df, column_name): corr = df.corr()[column_name] new_value = corr.mean() * df[column_name].mean() df[column_name] = df[column_name].fillna(new_value) return df fill_na(df2, column_name='rectal_temp')

Python
View
generate functionSat, 04 Mar 2023

Correct the code to correctly output the dataframe with this data: gdpdiff = pd.DataFrame({'Top1': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) gdpdiff

def df_gdp_diff(df): df19 = df_19[df_19['Year'] == 2019] gdpdiff = pd.DataFrame({'Top1': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) return gdpdiff

Python
View
generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

Python
View
generate functionFri, 03 Feb 2023

There is a df Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of strings for all "operator"

df["operator"].apply(len).mean()

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incident_types with a value >= 5. Sort them in descending order by incident_type column

def df_incident_type(df): df = df.groupby(['route', 'operator', 'group_name', 'incident_type'])[['incident_type']].count() df = df.sort_values('incident_type', ascending=False) df.reset_index(inplace=True) df = df.drop_duplicates(subset=['route'], keep='first') return df

Python
View
generate functionWed, 21 Dec 2022

Change the class so that when self.seniority is added, the designer is assigned 1 point and when self.intlawards is added, the designer is assigned 2 points. Correct the condition so that these points add up correctly: if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() Example class: class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

Python
View
generate functionWed, 08 Mar 2023

Write code that adds money +=1 on any number from the list of winnums money = 0 winnums = [777, 999, 555, 333, 111, 177, 277, 377, 477, 577, 677, 877, 977, 100, 200, 300, 400, 500, 600, 700, 800, 900, 110, 120, 130, 140, 150, 160, 170, 180, 190, 210, 220, 230, 240, 250, 260, 270, 280, 290, 310, 320, 330, 340, 350, 360, 370, 380, 390, 410, 420, 430, 440, 450, 460, 470, 480, 490, 510, 520, 530, 540, 550, 560, 570, 580, 590, 610, 620, 630, 640, 650, 660, 670, 680, 690, 710, 720, 730, 740, 750, 760, 770, 780, 790, 810, 820, 830, 840, 850, 860, 870, 880, 890, 910, 920, 930, 940, 950, 960, 970, 980, 990, 107, 117, 127, 137, 147, 157, 167, 177, 187, 197, 207, 217, 227, 237, 247, 257, 267, 277, 287, 297, 307, 317, 327, 337, 347, 357, 367, 377, 387, 397, 407, 417, 427, 437, 447, 457, 467, 477, 487, 497, 507, 517, 527, 537, 547, 557, 567, 577, 587, 597, 607, 617, 627, 637, 647, 657, 667, 677, 687, 697, 707, 717, 727, 737, 747, 757, 767, 787, 797, 807, 817, 827, 837, 847, 857, 867, 877, 887, 897, 907, 917, 927, 937, 947, 957, 967, 977, 987, 997, 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090]

for i in winnums: money += 1 print(money)

Python
View
generate functionThu, 22 Dec 2022

Task Given three integers a ,b ,c, return the largest number obtained after inserting the following operators and brackets: +, *, () In other words , try every combination of a,b,c with [*+()] , and return the Maximum Obtained (Read the notes for more detail about it) Example With the numbers are 1, 2 and 3 , here are some ways of placing signs and brackets: 1 * (2 + 3) = 5 1 * 2 * 3 = 6 1 + 2 * 3 = 7 (1 + 2) * 3 = 9 So the maximum value that you can obtain is 9. Notes The numbers are always positive. The numbers are in the range (1  ≤  a, b, c  ≤  10). You can use the same operation more than once. It's not necessary to place all the signs and brackets. Repetition in numbers may occur . You cannot swap the operands. For instance, in the given example you cannot get expression (1 + 3) * 2 = 8.

def expression_matter(a, b, c): return max([a * b * c, a * (b + c), (a + b) * c, a + b + c])

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated. Sort the name values in ascending order

import pandas as pd df = pd.DataFrame([ [1, "Ksenia Rodionova", "2021-07-01", "Alpina", 1639.000000, "by_recommendation", 48, 3.0], [2, "Ulyana Selezneva", "2021-07-01", "AquaMania", 930.000000, "by_airbnb.com", 97, 4.0], [3, "Konstantin Prokhorov", "2021-07-01", "Breeze", 1057.720000, "agg_trivago.com", 173, 4.0], [4, "Petrov Vladimir", "2021-07-01", "Moreon", 1403.000000, "agg_onlinetours.ru", 229, 4.0], [5, "Arina Selivanova", "2021-07-01", "Alpina", 1639.000000, "agg_sutochno.ru", 63, 4.0], [6

Python
View
generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

if(data['nutri']>data['mineral']): plt.text(data['mineral'] - (data['nutri'] - data['mineral']), data.index, '{:.2f}%'.format(data['mineral'] - data['nutri']), color='white', ha="center", va='bottom') else: plt.text(data['mineral'] + (data['mineral'] - data['nutri']), data.index, '{:.2f}%'.format(data['mineral'] - data['nutri']), color='black', ha="center", va='bottom') plt.legend() plt.title('Nutri vs Mineral', size = 20) plt.xlabel('Country', size = 20) plt.ylabel('Percentage', size = 20) plt.show()

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write a function that shows which decade had the most movies with a high rating Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def high_rating(df): for i in df['rating']: if i > 6.5: print("The years with most high rating movies are",df['decade']) #used for

Python
View
generate functionFri, 23 Dec 2022

Write a method, that will get an integer array as parameter and will process every number from this array. Return a new array with processing every number of the input-array like this: If the number has an integer square root, take this, otherwise square the number. Example [4,3,9,7,2,1] -> [2,9,3,49,4,1] Notes The input array will always contain only positive numbers, and will never be empty or null.

def square_or_square_root(arr): new_arr = [] for i in arr: if int(i**(1/2)) == i**(1/2): new_arr.append(int(i**(1/2))) else: new_arr.append(i**2) return new_arr

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists and replaces them with replacing_num (but without touching the first one) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace(stnums, students, replacing_num): for index, item in enumerate(students): if item[0] in stnums: students[index][0] = replacing_num return students replace(stnums, students, replacing_num)

Python
View
generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def delete(df): return df.drop(df[df['Class 1'].str.contains('Rock') | df['Class 2'].str.contains('Rock')].index) delete(grass)

Python
View
generate functionThu, 19 Jan 2023

There is a Pandas dataframe: place decade perc_of_5star 0 1 2010-2020 2.300 1 2 1900-1910 1.379 2 3 1970-1980 1.179 3 4 2000-2010 1.176 4 5 1960-1970 1.133 build a horizontal barchart with perc_of_5star columns in descending order and decade values on the y-axis

def plot_barchart(df): df = df.sort_values('perc_of_5star', ascending = False).head(10) return df.plot.barh(x = 'decade', y = 'perc_of_5star', title = '% 5-star ratings by decade'); plot_barchart(df)

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a function year_leaders that will build a new dataframe and leave 1 line of performer and hits (having a maximum number of num_of_hits) in each chart_debut value

import pandas as pd performer = ['Glee Cast', 'Taylor Swift', 'Drake', 'YoungBoy Never Broke Again', 'Aretha Franklin', 'The Beatles'] hits = ['Somebody To Love', 'Friday', 'Loser Like Me', 'Baby', 'I Want You Back', 'Kacey Talk', 'Put It On Me', 'Dirty Iyanna', 'Lil Top', 'London Boy', 'Teardrops On My Guitar', 'Fifteen', 'Summer Sixteen', 'The Language', 'Weston Road Flow', 'Sgt. Pepper\'s Lonely Hearts Club Band/With A Little Help From My Friends'] chart_debut = [2009, 2008, 2016, 2020, 1967, 1978] time_on_chart = [290, 14299, 7449, 1012, 3490, 3548] consecutive_weeks = [47.0, 11880.0, 6441.0, 625.0, 2921.0, 2798.0] decade = ['2000-2010', '2000-2010', '2010-2020', '2020-2030

Python
View
generate functionSat, 25 Feb 2023

There is a Pandas dataframe: id grade rate_group 0 1077501 B 10-11% 1 1077430 C 15-17% 2 1077175 C 15-17% 3 1076863 C 13-14% 4 1075358 B 12-13% 5 1075269 A 7-8% 6 1069639 C 15-17% 7 1072053 E 17-25% 8 1071795 F 17-25% 9 1071570 B 12-13% Modify the dataframe so that grades are column names and rate_group are row names. Inside the rows should be the sum of the id by grouping grades and rate_group.

import pandas as pd def group(df): df = df.pivot_table(index='rate_group', columns='grade', values='id', aggfunc=np.sum) return df

Python
View
generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def delete_grass(pokemon): for i in pokemon.index: if 'Grass' in pokemon.loc[i, ['Class 1', 'Class 2']].values: pokemon.drop(i, inplace=True) return pokemon

Python
View
generate functionTue, 21 Mar 2023

Rewrite this code to calculate statistics by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

from scipy.stats import mannwhitneyu # ... stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 in this dataframe the names of films are not unique, because the same film may have been given different ratings Your task is to write a function that calculates the average rating of each movie and removes non-unique strings

def unique(data: pd.DataFrame()) -> pd.DataFrame(): pass

Python
View
generate functionWed, 22 Feb 2023

Modify this code to make a vertical bar graph instead of a pie chart (plotly.express library) question6 = "How likely would you work for a company whose mission is not bringing social impact ?" question6 = data[question6].value_counts() label = question6.index counts = question6.values colors = ['gold','lightgreen'] fig = go.Figure(data=[go.Pie(labels=label, values=counts)]) fig.update_layout(title_text='How likely would you work for a company whose mission is not bringing social impact?') fig.update_traces(hoverinfo='label+value', textinfo='percent', textfont_size=30, marker=dict(colors=colors, line=dict(color='black', width=3))) fig.show()

function that: Modify this code to make a vertical bar graph instead of a pie chart (plotly.express library) question6 = "How likely would you work for a company whose mission is not bringing social impact ?" question6 = data[question6].value_counts() label = question6.index counts = question6.values colors = ['gold','lightgreen'] fig = go.Figure(data=[go.Pie(labels=label, values=counts)]) fig.update_layout(title_text='How likely would you work for a company whose mission is not bringing social impact?') fig.update_traces(hoverinfo='label+value', textinfo='percent', textfont_size=30, marker=dict(colors=colors, line=dict(color='black', width=3))) fig.show()

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Leave only 1 unique row in each 'song' column in case the 'peak_position' column has a value of 1

def chart_peak(df): df.sort_values(by=['song', 'peak_position'], inplace=True) df.drop_duplicates('song', keep='first', inplace=True) return df

Python
View
generate functionTue, 13 Dec 2022

Create the function to change the element positions with the same exact logics: Input: ["tail", "body", "head"] Output: ["head", "body", "tail"] Input: ["tails", "body", "heads"] Output: ["heads", "body", "tails"] Input: ["bottom", "middle", "top"] Output: ["top", "middle", "bottom"] Input: ["lower legs", "torso", "upper legs"] Output: ["upper legs", "torso", "lower legs"] Input: ["ground", "rainbow", "sky"] Output: ["sky", "rainbow", "ground"]

def switch_elements(arr): return [arr[-1]] + arr[1:-1] + [arr[0]]

Python
View
generate functionTue, 13 Dec 2022

Create the function that find the coefficients of quadratic equation of the given two roots (x1 and x2). Equation will be the form of ax^2 + bx + c = 0 Return type is a Vector containing coefficients of the equations in the order (a, b, c). Since there are infinitely many solutions to this problem, we fix a = 1. Remember, the roots can be written like (x-x1) * (x-x2) = 0 Example quadratic(1,2) = (1, -3, 2) This means (x-1) * (x-2) = 0; when we do the multiplication this becomes x^2 - 3x + 2 = 0 Example 2 quadratic(0,1) = (1, -1, 0) This means (x-0) * (x-1) = 0; when we do the multiplication this becomes x^2 - x + 0 = 0 Notes Inputs will be integers. When x1 == x2, this means the root has the multiplicity of two

def quadratic(x1, x2): return (1, -x1 - x2, x1 * x2)

Python
View
generate functionMon, 26 Dec 2022

Add to this code the numbering of all group_list values from 1 dct = convert_to_dict(students) def group_list(dct, gr): group_list = [] for k, v in dct.items(): if gr in v: group_list.append(' '.join(dct[k][0:3])) group_list.sort() return group_list group_list(dct, 'BST161')

def group_list(dct, gr): group_list = [] for k, v in dct.items(): if gr in v: group_list.append(' '.join(dct[k][0:3])) group_list.sort() for i, n in enumerate(group_list): print('{}. {}'.format(i+1, n)) group_list(dct, 'BST161') # output 1. A. García de Leon 2. A. Martínez Martínez 3. A. Romero de la Fuente 4. C. Ramírez de Cartagena 5. E. González Gómez 6. F. García León 7. H. Solís Ortíz 8. J. Carlos 9. J. Fernández 10. J. Muñoz Solís 11. L. González Gómez

Python
View
generate functionMon, 20 Feb 2023

There is a Pandas code: df2.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df2.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') Write a function that combines these two actions into one and returns the changed dataframe

def highlight_min_max(df): df.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') return df

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 2 2021-07-01 Yalta Skyline 15 12 38016 11655 8775 17586 Create a new occupancy_rate column and add a function that will read the values from the formula: ocup_rooms / total_rooms

def occupancy_rate(total_rooms, ocup_rooms): return ocup_rooms / total_rooms df['occupancy_rate'] = occupancy_rate(df['total_rooms'], df['ocup_rooms'])

Python
View
generate functionWed, 22 Feb 2023

Modify the code below to have column captions for the x-axis values question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def create_bar_chart(x, y, title): fig = px.bar(x=x, y=y, orientation='h') fig.update_layout(title_text=title) fig.show()

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

def interval(n, mean, sig, conf): h = 2*sig*math.sqrt(n)*norm.ppf(conf)/math.sqrt(n) return h

Python
View
generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

import matplotlib.pyplot as plt import pandas as pd data = pd.read_csv('data.csv') # Add your code below: plt.figure(figsize=(12, 4)) plt.bar(data.index, data['nutri'], color='darkblue', label='nutri') plt.bar(data.index, data['mineral'], color='brown', label='mineral') plt.xticks(data.index, data['food']) plt.xlabel('Food') plt.ylabel('Percentage difference') plt.title('Percentage difference of nutrients and minerals') plt.legend() plt.show()

Python
View
generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def bar_x_axis(df, column_name, title): #prepare the data question = df[column_name].value_counts() label = question6.index counts = question6.values #create the figure fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text=title) fig.show() bar_x_axis(df,'What is the most preferred working environment for you.','Какая рабочая среда для вас наиболее предпочтительна?')

Python
View
generate functionFri, 23 Dec 2022

Prolog: This kata series was created for friends of mine who just started to learn programming. Wish you all the best and keep your mind open and sharp! Task: Write a function that will accept two parameters: variable and type and check if type of variable is matching type. Return true if types match or false if not. Examples: 42, "int" --> True "42", "int" --> False

def type_checker(variable, type): if type(variable) == type: return True else: return False

Python
View
generate functionTue, 21 Mar 2023

Rewrite this code to calculate statistics by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

import scipy import pandas as pd import numpy as np data = pd.read_csv("cookie_cats.csv") data.head() def mann_whitney_test(data): """ Returns the result of a Mann Whitney test. """ # Calculate statistic and p-value stat, p = scipy.stats.mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) # Print result print('mann whitney stat=%.3f, p=%.3f' % (stat, p)) mann_whitney_test(data)

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name district injury_result incident_type victim_category victim_age 0 2015 01.01.2015 1 London General Go-Ahead Southwark 1 Onboard Injuries 7 10 1 2015 01.01.2015 4 Metroline Metroline Islington 1 Onboard Injuries 7 2 2 2015 01.01.2015 5 East London Stagecoach Havering 7 Onboard Injuries 7 8 3 2015 01.01.2015 5 East London Stagecoach None London Borough 7 Onboard Injuries 7 8 4 2015 01.01.2015 6 Metroline Metroline Westminster 5 Onboard Injuries 10 8 generate a unique id from 1 to 23158 for each line of the pandas dataframe

def generate_unique_id(df): df['unique_id'] = range(1, len(df) + 1)

Python
View
generate functionTue, 21 Mar 2023

There are two samples A and B. Draw boxplots comparing these two samples with a light blue filler and a caption for the names of the samples

def boxplot_compare_distributions(array1, array2, names = ["Sample 1", "Sample 2"]): # Create a figure instance fig = plt.figure(1, figsize=(9, 6)) # Create an axes instance ax = fig.add_subplot(111) # Create the boxplot bp = ax.boxplot([array1, array2]) ## change outline color, fill color and linewidth of the boxes for box in bp['boxes']: # change outline color box.set( color='#7570b3', linewidth=2) # change fill color box.set( facecolor = '#1b9e77' ) ## change color and linewidth of the whiskers for whisker in bp['whiskers']: whisker.set(color='#7570b3', linewidth=2) ## change color and linewidth of the caps for cap in bp['caps'

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 450 (Can't Live Without Your) Love And Affection Nelson 1990-07-07 1 93 15 14.0 607 (Everything I Do) I Do It For You Bryan Adams 1991-06-29 1 53 9 8.0 748 (Hey Won't You Play) Another Somebody Done Som... B.J. Thomas 1975-02-01 1 99 17 16.0 852 (I Can't Get No) Satisfaction The Rolling Stones 1965-06-12 1 67 13 12.0 951 (I Just) Died In Your Arms Cutting Crew 1987-03-07 1 80 14 13.0 Create a new dfs dataframe, where the data format of the chart_debut column is changed from 1991-06-29 to 1991

import pandas as pd dfs = pd.read_csv('https://raw.githubusercontent.com/coding-blocks-archives/ML-Noida-2019-June-Two/master/datasets/hot-100.csv', parse_dates=['chart_debut']) dfs['chart_debut'] = dfs['chart_debut'].dt.year dfs.head(10)

Python
View
generate functionMon, 27 Mar 2023

This code removes stopwords in lists incorrectly: without_sw = [[word for word in text if word not in stopwords_set] for text in only_words_text] It turns out like this: [[g], [u, n, l], [j, u, r, n, g], [p, n, ] It should be like this: [go, until, jurong, point Correct it.

from nltk.corpus import stopwords texts = [['go', 'until', 'jurong', 'point'], ['crazy', 'available', 'only', 'in', 'bugis', 'n', 'great', 'world', 'la', 'e', 'buffet', 'cine', 'there', 'got', 'amore', 'wat']] stopwords_set = set(stopwords.words('english')) without_sw = [word for text in texts for word in text if word not in stopwords_set]

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Plot a bar chart with performer on the x-axis and number_of_hits on the y-axis.

def plot_num_of_hits(df): df = df.sort_values("num_of_hits", ascending=False) # make sure you have a matplotlib # import matplotlib.pyplot as plt plt.bar(df.performer, df.num_of_hits) plt.show()

Python
View
generate functionSat, 28 Jan 2023

There is a Pandas dataframe: hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) Construct a bar graph with 'hotel' values in the x axis and change values in the 'date' column and 'av_revenue' values in the y axis

hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) plt.bar(hotels_rev['date'], hotels_rev['av_revenue'], color='red') plt.xlabel('Date') plt.ylabel('Average revenues') plt.title('Average revenues per hotel') plt.xticks(rotation=90) plt.show()

Python
View
generate functionThu, 15 Dec 2022

Create a function that allows you to move doc to another shelf directories. It should check for: 1) if both doc and shelf exist in the dictionary, the function should move doc value to the specified shelf and make return updated directories. 2) if the specified shelf exists in the directories and return ERROR NO SUCH KEY if it doesn't exist 3) if there is a specified doc in the dictionary and return ERROR NO SUCH VALUE if it doesn't exist Note that directories is a dictionary and doc values are in lists! Example: Input: directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } doc = '11-2' shelf = '3' Output: directories = { '1': ['2207 876234'], '2': ['10006'], '3': ['11-2'] }

def move(directories, doc, shelf): if doc in directories[shelf]: return "ERROR VALUE ALREADY EXISTS" elif doc not in directories: return "ERROR NO SUCH VALUE" elif shelf not in directories: return "ERROR NO SUCH KEY" else: directories[shelf].append(doc) for shelf in directories: if doc in directories[shelf]: del directories[shelf] return directories directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } doc = '11-2' shelf = '3' move(directories, doc, shelf)

Python
View
generate functionFri, 03 Feb 2023

There is a Pandas dataframe: year date route operator group_name district injury_result incident_type victim_category victim_age 0 2015 01.01.2015 1 London General Go-Ahead Southwark 1 Onboard Injuries 7 10 1 2015 01.01.2015 4 Metroline Metroline Islington 1 Onboard Injuries 7 2 2 2015 01.01.2015 5 East London Stagecoach Havering 7 Onboard Injuries 7 8 3 2015 01.01.2015 5 East London Stagecoach None London Borough 7 Onboard Injuries 7 8 4 2015 01.01.2015 6 Metroline Metroline Westminster 5 Onboard Injuries 10 8 generate a unique id for each line of the pandas dataframe

# Use index as a unique identifier df.index # Use a column as the unique identifier df['year'].values

Python
View
generate functionSun, 11 Dec 2022

Write a program that will ask the user for the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! cook_book = { 'salad': [ {'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'} {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'} {'ingridient_name': 'cucumbers', 'quantity': 20, 'measure': 'g'} ], 'pizza': [ {'ingridient_name': 'cheese', 'quantity': 20, 'measure': 'gr'} {'ingridient_name': 'sausage', 'quantity': 30, 'measure': 'gr'} ], 'lemonade': [ {'ingridient_name': 'lemon', 'quantity': 1, 'measure': 'pcs'} {'ingridient_name': 'water', 'quantity': 200, 'measure': 'ml'} {'ingridient_name': 'sugar', 'quantity': 10, 'measure': 'g'} ] } Enter the number of servings: 3 Result: Cheese: 210 gr. Tomatoes: 6 pcs. Cucumbers: 60g

def total_ingridients(cook_book): dish = input("Enter dish: ") ingridients = cook_book[dish] for i in ingridients: for value in i.values(): print(value) total_ingridients(cook_book)

Python
View
generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def func_name(group_number, dct): students = [] for key, value in dct.items(): if value[4] == group_number: students.append(value) students.sort(key=lambda x: x[0]) for index, student in enumerate(students): print(index, student[0], student[1], student[2]) func_name('BST162', dct)

Python
View
generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

def get_sample_size(z=1.96, conf=0.95, sigsqr=1, delta=0.5): return (z*z*sigsqr)/(delta*delta)

Python
View
generate functionMon, 27 Mar 2023

This code removes stopwords in lists incorrectly: without_sw = [[word for word in text if word not in stopwords_set] for text in only_words_text] It turns out like this: [[g], [u, n, l], [j, u, r, n, g], [p, n, ] It should be like this: [go, until, jurong, point Correct it.

stopwords_set = set(stopwords.words('english')) without_sw = [[word for word in text if word not in stopwords_set] for text in only_words_text] without_sw = [item for sublist in without_sw for item in sublist]

Python
View
generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): if clothes == load: return water if clothes < load: return water return how_much_water(water, load, clothes - 1) * 1.1

Python
View
generate functionWed, 22 Feb 2023

Add the x-axis value captions to the columns fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

# неверно def fig(x=counts, y=label): fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

Python
View
generate functionTue, 21 Mar 2023

There are 3 lists: water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1] nutri = [1,2,4,6,5,6,7,5,4,5,6,7,4,3,5,5,6,5,4,3,5] mineral =[2,1,1,3,2,4,2,4,5,4,3,2,3,2,3,1,3,4,5,1,4] add them to df Pandas dataframe in the following format: index treatments value 0 0 water 25 1 1 water 30 2 2 water 28 3 3 nutri 36 4 4 nutri 29 5 0 nutri 45 6 1 mineral 55 7 2 mineral 29 8 3 mineral 56

import pandas as pd water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1] nutri = [1,2,4,6,5,6,7,5,4,5,6,7,4,3,5,5,6,5,4,3,5] mineral =[2,1,1,3,2,4,2,4,5,4,3,2,3,2,3,1,3,4,5,1,4] treatments = pd.DataFrame({"water": water, "nutri": nutri, "mineral": mineral}, index=range(21)) treatments.reset_index().melt(id_vars=["index"], var_name="treatments", value_name="value")

Python
View
generate functionTue, 21 Mar 2023

Rewrite this code to calculate statistics by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

from scipy.stats import mannwhitneyu import numpy as np def mann_whitney_plus_means(turnstile_weather): ''' This function will consume the turnstile_weather dataframe containing our final turnstile weather data. You will want to take the means and run the Mann Whitney U test on the ENTRIESn_hourly column in the turnstile_weather dataframe. This function should return: 1) the mean of entries with rain 2) the mean of entries without rain 3) the Mann-Whitney U statistic and p-value comparing the number of entries with rain and the number of entries without rain You should feel free to use scipy's Mann-Whitney implementation, and you might also find it useful to use numpy's mean function. Here are the functions' documentation: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

def interval(n, mean, sig, conf): h = stats.norm.interval(conf, loc=mean, scale=sig / np.sqrt(n))[1] - stats.norm.interval(conf, loc=mean, scale=sig / np.sqrt(n))[0] return round(h)

Python
View
generate functionThu, 15 Dec 2022

Create a function that allows you to move values between directories keys. It should check for: 1) if the specified key is in the dictionary and show ERROR NO SUCH KEY if it does not exist 2) if the value is present in the dictionary and show ERROR NO SUCH VALUE if it doesn't exist 3) if both key and value are in the dictionary, then the function should move value v to another k and return a new dictionary Example: Input: directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } v = '11-2' k = '3' Output: directories = { '1': ['2207 876234'], '2': ['10006'], '3': ['11-2'] }

def move(directories, v, k): if k not in directories: print('ERROR NO SUCH KEY') return directories elif v not in directories[k]: print('ERROR NO SUCH VALUE') return directories else: directories[k].append(v) for key, value in directories.items(): if v in value: value.remove(v) return directories directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } move(directories, '11-2', '3')

Python
View
generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated. Sort the name values in ascending order

df[df[['name']].duplicated(keep=False)] df['name'].sort_values()

Python
View
generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def group_lst(num): for i in dct[num]: print(i) return

Python
View
generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

import math import scipy from scipy import stats def sample_size(delta, sigsqr, conf): z = stats.norm.ppf(conf) n = math.ceil((2*z*z*sigsqr)/(delta**2)) return n sample_size(10, 100, 0.95)

Python
View
generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

import math class Point(object): def __init__(self, x=0, y=0): self.x = x self.y = y # TODO Write a function calculating distance between Point a and Point b. def distance(a, b): c = math.sqrt((a.x - b.x)**2 + (a.y - b.y)**2) return c a = Point(1,1) b = Point(1,2) print(distance(a,b))

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name district injury_result incident_type victim_category victim_age 0 2015 01.01.2015 1 London General Go-Ahead Southwark 1 Onboard Injuries 7 10 1 2015 01.01.2015 4 Metroline Metroline Islington 1 Onboard Injuries 7 2 2 2015 01.01.2015 5 East London Stagecoach Havering 7 Onboard Injuries 7 8 3 2015 01.01.2015 5 East London Stagecoach None London Borough 7 Onboard Injuries 7 8 4 2015 01.01.2015 6 Metroline Metroline Westminster 5 Onboard Injuries 10 8 generate a unique id from 1 to 23158 for each line of the pandas dataframe

df_new['unique_id'] = pd.Series(range(1, df_new.shape[0]+1)) df_new.head()

Python
View
generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a scaled_data dataframe with the values from "sl" in the first column and "sw" in the second column. The values in the lists are contained in the following format: [[-0.90068117] [-1.14301691] [-1.38535265] [-1.50652052] [-1.02184904] [-0.53717756] [-1.50652052] [-1.02184904] [-1.74885626] [-1.14301691]]

sl = [0.067,0.067,0.067,0.067,0.067,0.067,0.067,0.067,0.067,0.067] sw = [0.050,0.050,0.050,0.050,0.050,0.050,0.050,0.050,0.050,0.050] scaled_data = { } # Add code here df = pd.DataFrame(scaled_data, columns=['sl', 'sw'])

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

def year_leaders(df): return df.groupby('chart_debut')['num_of_hits'].max()

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times Write a function that groups unique movie titles in one column and displays their average rating in the second column

def group_movies(df):

Python
View
generate functionTue, 14 Feb 2023

There is a df Pandas dataframe: Date Open High Low Close Adj Close Volume 0 2022-02-14 133.365494 136.166504 133.302002 135.300003 135.300003 26792000 1 2022-02-15 137.471497 137.899994 135.539505 136.425507 136.425507 26578000 2 2022-02-16 136.430496 137.945999 134.823654 137.487503 137.487503 25610000 3 2022-02-17 136.149994 136.839493 132.201996 132.308502 132.308502 30968000 Write a function that creates an is_month_end column and calculates values there as follows: 1) if the 'Date' column contains the last day of the month, the 'is_month_end' column will contain number 1 2) otherwise 0 is written

def is_month_end(date): if date[-2:] == '31': return 1 else: return 0 is_month_end(df['Date'])

Python
View
generate functionMon, 13 Mar 2023

This code does not work and gives the following error: ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted']. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(data, data_class, random_state=42) from sklearn.discriminant_analysis import LinearDiscriminantAnalysis lda.fit(X_train, y_train) from sklearn.metrics import f1_score y_pred = lda.predict(X_test) f1_score(y_test, y_pred) Write code to evaluate its quality with F1 measure

from sklearn.model_selection import train_test_split from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.metrics import f1_score from sklearn.preprocessing import LabelEncoder from sklearn.metrics import confusion_matrix from sklearn.metrics import classification_report le = LabelEncoder() le.fit(data_class) data_class = le.transform(data_class) X_train, X_test, y_train, y_test = train_test_split(data, data_class, random_state=42) lda = LinearDiscriminantAnalysis() lda.fit(X_train, y_train) y_pred = lda.predict(X_test) f1 = f1_score(y_test, y_pred, average='macro') # average='macro' print('F1:', f1) #confusion_matrix(y_test, y_pred) print(classification_report(y_test, y_

Python
View
generate functionSat, 25 Feb 2023

x1 = (round(g1['rate_group'].value_counts() / g1['rate_group'].value_counts().sum() * 100, 2)) x2 = (round(g2['rate_group'].value_counts() / g2['rate_group'].value_counts().sum() * 100, 2)) x1.plot(kind='hist', alpha=0.5, bins=6, density=True) x2.plot(kind='hist', alpha=0.5, bins=6, density=True) plt.legend(loc='upper left') plt.title('Сравнение распределений с собственным жильем и без') Add to the code above the caption of the legends on the chart: - for x1 = 'x1' - for x2 = 'x2'

def add(a, b): return a + b add(1, 2)

Python
View
generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): return water * (1.1 ** (clothes - load)) how_much_water(5, 10, 14)

Python
View
generate functionTue, 24 Jan 2023

there is a Pandas dataframe: timestamp visit_id url region user_id traffic_source 0 1549980692 e3b0c44298 https://host.ru/3c19b4ef7371864fa3 Russia b1613cc09f yandex 1 1549980704 6e340b9cff https://host.ru/c8d9213a31839f9a3a Germany 4c3ec14bee direct 2 1549980715 96a296d224 https://host.ru/b8b58337d272ee7b15 USA a8c40697fb yandex 3 1549980725 709e80c884 https://host.ru/b8b58337d272ee7b15 Italy 521ac1d6a0 yandex 4 1549980736 df3f619804 https://host.ru/b8b58337d272ee7b15 Russia d7323c571c yandex Create a new dataframe summary in which count how many traffic_source values relate to each region

region direct yandex google 0 Russia 1 4 0 1 Germany 0 1 0 2 USA 0 0 1 3 Italy 0 1 0

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'year': [2012, 2012, 2013, 2014, 2014], 'reports': [4, 24, 31, 2, 3]} df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma']) df

Python
View
generate functionMon, 26 Dec 2022

Write a code that looks for a repeating student number and replaces it with "9090". lst = [ ["0001", "Antonov", "Anton", "Igorevich", "08/20/2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def replace_student(lst): result = [] for student in lst: if student[0] == student[-2]: student[-2] = '9090' result.append(student) return result print(replace_student(lst))

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Create a new column "is_loyal" through the lambda function, in which you put the value "True" if duplicates are found in the "name" column. If not put a "False"

df['is_loyal'] = df.duplicated(subset='name', keep=False).apply(lambda x: 'True' if x else 'False')

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new data frame and leave only the rows that have the maximum value of num_of_hits in it when grouped by the chart_debut column

df = pd.DataFrame({'chart_debut': ['2012', '2012', '2012', '2014', '2017'], 'num_of_hits': [1,2,3,4,5]}) def year_leaders(df): return df.groupby('chart_debut').max() year_leaders(df)

Python
View
generate functionTue, 13 Dec 2022

Write a function that outputs True if the letters in it are one after the other in the string, and False otherwise. def goes_after(word: str, first: str, second: str) -> bool: s = '' for i, let in enumerate(word): print (let[i] + let[i+1]) print(first + second) if let[i] + let[i-1] == first + second: return True else: return False Examples: goes_after("world", "o", "r") Result: True goes_after("world", "w", "r") Result: False

if word.find(first) + word.find(second) == -2: s = word.find(first) if word[s + 1] == second: return True else: return False else: return False goes_after("world", "o", "r")

Python
View
generate functionSun, 11 Dec 2022

Change the code in this program so that the value of 'quantity' is multiplied by the variable 'portions' and the result is displayed: def change_quantity(dish, portions): return dish['quantity'] * portions change_quantity(dish, portions) dish = 'salad' portions = 5 recipe book = { 'salad': [ { 'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'} {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'} {'ingridient_name': 'cucumbers', 'quantity': 20, 'measure': 'g'} {'ingridient_name': 'olives', 'quantity': 10, 'measure': 'g'} {'ingridient_name': 'olive oil', 'quantity': 20, 'measure': 'ml'} {'ingridient_name': 'salad', 'quantity': 10, 'measure': 'g'} {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'g'} ]}

def prepare_dish(dish, portions): for ingridient in dish: ingridient['quantity'] = ingridient['quantity'] * portions return dish prepare_dish(dish, portions)

Python
View
generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders_all(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) return dfp_copy year_leaders_all(dfp)

Python
View
generate functionTue, 13 Dec 2022

You are given a method called main, make it print the line Hello World!, (yes, that includes a new line character at the end) and don't return anything Note that for some languages, the function main is the entry point of the program. Here's how it will be tested: Solution.main("parameter1", "parameter2","parametern") Hints: Check your references Think about the scope of your method For prolog you can use write but there are better ways If you still don't get it probably you can define main as an attribute of the Solution class that accepts a single argument, and that only prints "Hello World!" without any return.

class Solution(object): def main(self): print("Hello World!") Solution.main("parameter1","parameter2")

Python
View
generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': X_test[index], 'actual': y_test[index], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python
View
generate functionThu, 15 Dec 2022

Create a function that allows you to move doc to another shelf directories. It should check for: 1) if the specified shelf exists in directories and give ERROR NO SUCH KEY if it does not exist 2) if the specified doc exists in the directories and show ERROR NO SUCH VALUE if it doesn't exist 3) if both doc and shelf are in the dictionary, the function should move the value of doc to the specified shelf and make return updated directories. Note that directories is a dictionary and doc values are in lists! Example: Input: directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } doc = '11-2' shelf = '3' Output: directories = { '1': ['2207 876234'], '2': ['10006'], '3': ['11-2'] }

def move_doc(directories, doc, shelf): if shelf in directories: if doc not in directories[shelf]: print('No such value') else: for value in directories.values(): if doc in value: value.remove(doc) else: print('No such key') directories[shelf].append(doc) return directories directories = { '1': ['2207 876234', '11-2', '5455 028765'], '2': ['10006', '5400 028765', '5455 002299'], '3': [] } doc = '11-2' shelf = '3' move_doc(directories, doc, shelf)

Python
View
generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function def how_much_water(L,X,N) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(L,X,N): return L * (1+0.1)**(N-X) how_much_water(5, 10, 14)

Python
View
generate functionTue, 13 Dec 2022

Create a function called _if which takes 3 arguments: a boolean value bool and 2 functions (which do not take any parameters): func1 and func2 When bool is truth-ish, func1 should be called, otherwise call the func2. Example: def truthy(): print("True") def falsey(): print("False") _if(True, truthy, falsey) # prints 'True' to the console

def _if(bool, func1, func2): if bool: func1() else: func2() def truthy(): print("True") def falsey(): print("False") _if(True, truthy, falsey)

Python
View
generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 155 [free, entry, wkly, comp, win, fa, cup, final,... The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] For index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['message'], 'actual': df.iloc[index]['label'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it

misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['message'], 'actual': y_test[index], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python
View
generate functionWed, 08 Mar 2023

Upgrade the code so that a roll of 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090 will drop 1 dollar for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1

#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Wed Apr 1 16:34:47 2020 @author: jordan """ def ends77(x): return x % 100 == 77 def ends7(x): return x % 10 == 7 def ends00(x): return x % 100 == 0 def ends0(x): return x % 10 == 0 def dropdollar(x): return x % 100 == 0 or x % 100 == 7 def dropdollars(x): return x % 100 == 0 or x % 100 == 7 money = 0 trials = 100000 for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15;

Python
View
generate functionWed, 21 Dec 2022

Create a function that finds the key with the maximum value in 'Value' and displays it in key - value format: {'AUD': {'ID': 'R01010', {'NumCode': '036', {'CharCode': 'AUD', {'Nominal': 1, 'Name': 'Australian Dollar', 'Value': '46.9983, 'Previous': 45.9496}, 'AZN': {'ID': 'R01020A', 'NumCode': '944', 'CharCode': 'AZN', 'Nominal': 1, 'Name': 'AZN', Value: 41.4856, Previous': 40.5904}, 'GBP': {'ID': 'R01035', 'NumCode': '826', 'CharCode': 'GBP', 'Nominal': 1, 'Name': 'Pound Sterling United Kingdom', 'Value': 85.611, 'Previous': 83.7015},

def max_in_dictionary(d): max_key = max(d, key=lambda key: d[key]['Value']) return (max_key, d[max_key]) max_in_dictionary(rates)

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the "timestamp" column for a given "userId".

import pandas users = pandas.read_csv('ml-100k/u.user', sep='|', names=['user_id', 'age', 'gender', 'occupation', 'zip_code']) ratings = pandas.read_csv('ml-100k/u.data', sep='\t', names=['user_id', 'movie_id', 'rating', 'timestamp']) avg_lifetime = ratings.groupby('user_id')['timestamp'].agg(['max','min']).mean(axis=1).mean() print(avg_lifetime)

Python
View
generate functionMon, 12 Dec 2022

Create a function that returns the CSV representation of a two-dimensional numeric array. Example: input: [[ 0, 1, 2, 3, 4 ], [ 10,11,12,13,14 ], [ 20,21,22,23,24 ], [ 30,31,32,33,34 ]] output: '0,1,2,3,4\n' +'10,11,12,13,14\n' +'20,21,22,23,24\n' +'30,31,32,33,34' Array's length > 2.

def to_csv_string(array): return '\n'.join([','.join([str(j) for j in i]) for i in array]) to_csv_string([[ 0, 1, 2, 3, 4 ], [ 10,11,12,13,14 ], [ 20,21,22,23,24 ], [ 30,31,32,33,34 ]])

Python
View
generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that will create a new hot_years dataframe, group the data by the "year" column, average the values in the "av_temp" column, and leave only those lines in the dataframe where av_temp > 15

def hot_years_create(df_ru, min_temp): df_ru_hot = df_ru.groupby(['year'])['av_temp'].mean().reset_index() return df_ru_hot[df_ru_hot['av_temp'] > min_temp] hot_years_create(df_ru, 15)

Python
View
generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function def how_much_water(L,X,N) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): if load >= clothes: return water else: return water * 1.1 ** (clothes - load) how_much_water(5, 10, 14)

Python
View
generate functionThu, 20 Apr 2023

When seed=42 is fixed, try experimenting with the differential_evolution algorithm parameters: strategy, popsize, mutation. import numpy as np from scipy.optimize import differential_evolution import matplotlib.pyplot as plt %matplotlib inline def ackley(x): arg1 = -0.2 * np.sqrt(0.5 * (x[0] ** 2 + x[1] ** 2)) arg2 = 0.5 * (np.cos(2. * np.pi * x[0]) + np.cos(2. * np.pi * x[1])) return -20. * np.exp(arg1) - np.exp(arg2) + 20. + np.e bounds = [(-10, 10), (-10, 10)] result = differential_evolution(ackley, bounds, seed=42) ###YOUR CODE HERE

artificial parameters def ackley(x): arg1 = -0.2 * np.sqrt(0.5 * (x[0] ** 2 + x[1] ** 2)) arg2 = 0.5 * (np.cos(2. * np.pi * x[0]) + np.cos(2. * np.pi * x[1])) return -20. * np.exp(arg1) - np.exp(arg2) + 20. + np.e bounds = [(-10, 10), (-10, 10)] def plot_optimisation(strategy): result = differential_evolution(ackley, bounds, strategy=strategy, seed=42) x = np.linspace(-10, 10, 100) y = np.linspace(-10, 10, 100) X, Y = np.meshgrid(x, y) Z = np.array([ackley([x, y]) for x, y in zip(np.ravel(X), np.ravel(Y))

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 We need to find customers who have contacted us several times and make a separate table with them. That is, we need to find duplicate values in the name column and leave only them, all the rest removed. Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

def remove_duplicate(df): # remove duplicates in column name return df[df.duplicated(subset="name", keep=False)]

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

def interval(n, mean, sig, conf): h = sig / (n ** 0.5) * norm.ppf(conf) return int(h)

Python
View
generate functionMon, 27 Mar 2023

There are variables X_train, X_test, y_train, y_test created through sklearn train_test_split. They contain text data in string format. Convert them to float numbers.

import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 2), columns=['Col1', 'Col2']) X_train, X_test, y_train, y_test = train_test_split( df['Col1'], df['Col2'], random_state=1 ) def convert_to_float(x_train, x_test, y_train, y_test): return ( x_train.astype(float), x_test.astype(float), y_train.astype(float), y_test.astype(float) ) convert_to_float(X_train, X_test, y_train, y_test)

Python
View
generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value of each column

import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt df = pd.DataFrame(data=[1,1,2,2,4,1,3,2,5,4,3,2,4,1,3,4,3,2,4,5,1,3], columns=['water']) df['nutri'] = pd.DataFrame(data=[1,2,2,4,6,2,4,5,4,5,6,4,3,3,5,5,6,5,4,3,3,5], columns=['nutri']) df['mineral'] = pd.DataFrame(data=[2,1,1,3,2,4,1,2,5,4,3,3,2,2,3,1,3,4,5,4,1,3], columns=['mineral']) plt.figure(figsize=(12,5)) sns.

Python
View
generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that converts dataframe columns to rows

def columns_to_rows(dataframe): new_dataframe = pd.DataFrame(dataframe.loc[0]).T return new_dataframe

Python
View
generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def grass (df): df = df[df['Class 1'] != 'Rock'] df = df[df['Class 2'] != 'Rock'] return df

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated. Merge the rows with duplicates and sort the name column in ascending order

def filter_df(df, column): return df[df[column].duplicated(keep=False)].sort_values(column) df = pd.DataFrame({'name': ['Ksenia Rodionova', 'Ulyana Selezneva', 'Konstantin Prokhorov', 'Petrov Vladimir', 'Arina Selivanova', 'Ksenia Rodionova'], 'profit_per_room': [1639.000000, 930.000000, 1057.720000, 1403.000000, 1639.000000, 1639.000000]}) filter_df(df, 'name')

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

df = pd.DataFrame([['Ksenia Rodionova', 'Artur Petrov', 'Ivan Sidorov', 'Ksenia Rodionova']]).T df.columns = ['name'] df.drop_duplicates(keep = 'first', inplace = True) df.sort_values(by = 'name', ascending = True)

Python
View
generate functionWed, 18 Jan 2023

There is a dataframe of Pandas: id movie rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (1994) 3.5 2 Three Colors: Blue (1993) 5.0 3 Underground (1995) 5.0 4 Singing in the Rain (1952) 3.5 Write a function that creates a new 'class' column and substitutes values there depending on the rating: - grade 2 and below - low rating; - grade 4 and below - average rating; - 4.5 and 5 - high rating.

def ratings(x): if x <= 2.0: return 'Low' elif x <= 4.0: return 'Average' else: return 'High' df['rating'] = df['rating'].apply(ratings) df.head()

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Pandas dataframe is available: song artist chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding (unique) values to that column are written through the "," sign into a new hits column.

def chart_to_hits(df): df['hits'] = df.groupby('performer')['song'].apply(lambda x: ','.join(x)).reset_index()['song'] df.drop_duplicates(subset = 'performer', inplace = True) return df

Python
View
generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 3819 Imagine Dragons Radioactive, Natural, Believer, Thunder, On To... 2012 11101 9304.0 2674 Ed Sheeran Thinking Out Loud, Photograph, Sing, Don't, On... 2014 10685 10039.0 9590 The Weeknd Starry Eyes, Save Your Tears, Acquainted, Blin... 2022 9429 6713.0 5812 Mariah Carey Without You/Never Forget You, Don't Forget Abo... 1994 8735 6616.0 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0 Write a function divide_decade(df, decade) which, when a decade is specified, will perform the appropriate filter of the chart_debut column on the grouped Performer column. For example divide_decade(df, 1989) 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0

def divide_decade(df, decade): decade_df = df[df['chart_debut'] // 10 == decade // 10] return decade_df

Python
View
generate functionThu, 23 Mar 2023

Write code in Python that downloads all the text file names at url: http://stup.pro/wp-content/uploads/2023/03/*.txt into one list of book_files

import requests from bs4 import BeautifulSoup def get_book_files(url): text_files = [] resp = requests.get(url) soup = BeautifulSoup(resp.text, 'lxml') for link in soup.find_all('a', href=True): if '.txt' in link['href']: text_files.append(link['href']) return text_files books = get_book_files('http://stup.pro/wp-content/uploads/2023/03/') print(books)

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. import numpy as np from scipy import stats def seed(seed): np.random.seed(seed) n = np.random.choice(range(10,26)) mean = np.random.choice(range(120,141)) sig = np.random.choice(range(10,21)) conf = np.random.choice([0.90, 0.95, 0.98, 0.99, 0.999]) return n, mean, sig, conf def interval(n, mean, sig, conf): h = return print(interval(*seed(12))) print(interval(*seed(45))) print(interval(*seed(7))) The result of the function should be three values: 12, 28, 21

import numpy as np from scipy import stats def seed(seed): np.random.seed(seed) n = np.random.choice(range(10,26)) mean = np.random.choice(range(120,141)) sig = np.random.choice(range(10,21)) conf = np.random.choice([0.90, 0.95, 0.98, 0.99, 0.999]) return n, mean, sig, conf def interval(n, mean, sig, conf): h = sig * stats.t.ppf((1+conf)/2, n-1) / np.sqrt(n) return np.round(h) print(interval(*seed(12))) print(interval(*seed(45))) print(interval(*seed(7))) The result of the function should be three values: 12, 28, 21

Python
View
generate functionWed, 18 Jan 2023

Fix this function so that when it finds a value in a Pandas dataframe string, it returns its key: def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data.values(): if value in row: return key return 'undefined' df2['region'] = df2['keyword'].apply(geo_class)

df2['region'] = df2['keyword'].apply(geo_class)

Python
View
generate functionTue, 13 Dec 2022

Given an array of numbers, check if any of the numbers are the character codes for lower case vowels (a, e, i, o, u). If they are, change the array value to a string of that vowel. Return the resulting array. Example: Input: inp = [118, "u",120,121,"u",98,122,"a",120,106,104,116,113,114,113,120,106 ] Output: [118,117,120,121,117,98,122,97,120,106,104,116,113,114,113,120,106 ]

def replace_vowels(vowels): vowel_codes = [97, 101, 105, 111, 117] # a, e, i, o, u result = [] for vowel in vowels: if isinstance(vowel, str): result.append(vowel) elif vowel in vowel_codes: result.append(chr(vowel)) else: result.append(vowel) return result inp = [118, "u",120,121,"u",98,122,"a",120,106,104,116,113,114,113,120,106 ] print(replace_vowels(inp))

Python
View
generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

# 1. how much water does my washing machine use # 2. how much water does my clothes need for washing # 1. 5 litres # 2. (1.1 ^ (14 - 10)) * 5 # 3. 5 * 1.1 ^ 4 # 4. 5 * 1.1 * 1.1 * 1.1 * 1.1 = 7.4074 def how_much_water(water, load, clothes): return (1.1 ** (clothes - load)) * water print(how_much_water(5, 10, 14))

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name district injury_result incident_type victim_category victim_age 0 2015 01.01.2015 1 London General Go-Ahead Southwark 1 Onboard Injuries 7 10 1 2015 01.01.2015 4 Metroline Metroline Islington 1 Onboard Injuries 7 2 2 2015 01.01.2015 5 East London Stagecoach Havering 7 Onboard Injuries 7 8 3 2015 01.01.2015 5 East London Stagecoach None London Borough 7 Onboard Injuries 7 8 4 2015 01.01.2015 6 Metroline Metroline Westminster 5 Onboard Injuries 10 8 generate a unique id from 1 to 23158 for each line of the pandas dataframe

def generate_unique_id(dataframe): dataframe.index += 1 return dataframe generate_unique_id(df_new)

Python
View
generate functionThu, 12 Jan 2023

There is a table: userId movieId rating timestamp title genres 0 1 296 5.0 1147880044 Pulp Fiction (1994) Comedy|Crime|Drama|Thriller 1 1 306 3.5 1147868817 Three Colors: Red (Trois couleurs: Rouge) (1994) Drama 2 1 307 5.0 1147868828 Three Colors: Blue (Trois couleurs: Bleu) (1993) Drama 3 1 665 5.0 1147878820 Underground (1995) Comedy|Drama|War 4 1 899 3.5 1147868510 Singin' in the Rain (1952) Comedy|Musical|Romance Write code for Pandas that shows the movie with the most rows with a value of 5.0 in the rating column

def most_5(data): data.groupby('movieId')['rating'].count() return data.sort_values(by = 'rating', ascending = False).head(1) most_5(df)

Python
View
generate functionMon, 12 Dec 2022

Create a program that will take two lists of integers, a and b. Each list will consist of 3 positive integers above 0, representing the dimensions of cuboids a and b. You must find the difference of the cuboids' volumes regardless of which is bigger. For example, if the parameters passed are ([2, 2, 3], [5, 4, 1]), the volume of a is 12 and the volume of b is 20. Therefore, the function should return 8. Write solution in one line of code.

def find_difference(a, b): return abs(reduce(lambda x,y: x*y, a) - reduce(lambda x,y: x*y, b))

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of dates for all the operators

def avg_date_by_operator(df_new): # YOUR CODE HERE # raise NotImplementedError() return df_new.groupby("operator").agg("date").mean() df_2015 = pd.read_csv("bus_trucks_2015.csv") avg_date_by_operator(df_2015)

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated. Sort the name values in ascending order

df[df.duplicated(subset=['name'], keep= False)][['name']]

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) calculates the average temperature in countries (average av_temp by decade column grouped by country) 2) build a list of the 20 coldest countries in ascending av_temp order

import pandas as pd # function that: calculate the average temperature in countries def average_temp_of_country(df): return df.groupby('country')['av_temp'].mean() # function that: build a list of the 20 coldest countries in ascending av_temp order def coldest_20_countries(df): return df.groupby('country')['av_temp'].mean().sort_values()[:20]

Python
View
generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

from math import sqrt def get_sample_size(delta, conf, sigsqr): z = 1.96 # z-score for 95% confidence level return int(sigsqr * z**2 / delta**2) get_sample_size(100, 0.95, 2000) # => 477

Python
View
generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a "data" dataframe with the values from "sl" in the first column and "sw" in the second column. The values in the lists are contained in the following format: [[-0.90068117] [-1.14301691] [-1.38535265] [-1.50652052] [-1.02184904] [-0.53717756] [-1.50652052] [-1.02184904] [-1.74885626] [-1.14301691]]

def dataframe(sl, sw): data = {'sl': sl, 'sw': sw} df = pd.DataFrame(data) return df sl = [[-0.90068117], [-1.14301691], [-1.38535265], [-1.50652052], [-1.02184904], [-0.53717756], [-1.50652052], [-1.02184904], [-1.74885626], [-1.14301691]] sw = [[3.5], [2.5], [2.4], [1.5], [3.5], [2.2], [2.1], [1.5], [1.1], [1.3]] dataframe(sl, sw)

Python
View
generate functionSat, 25 Feb 2023

x1 = (round(g1['rate_group'].value_counts() / g1['rate_group'].value_counts().sum() * 100, 2)) x2 = (round(g2['rate_group'].value_counts() / g2['rate_group'].value_counts().sum() * 100, 2)) x1.plot(kind='hist', alpha=0.5, bins=6, density=True) x2.plot(kind='hist', alpha=0.5, bins=6, density=True) plt.legend(loc='upper left') plt.title('Сравнение распределений с собственным жильем и без') Add to the code above the caption of the legends on the chart: - for x1 = 'x1' - for x2 = 'x2'

def my_function(x1, x2): plt.legend(loc='upper left') plt.title('Сравнение распределений с собственным жильем и без') x1.plot(kind='hist', alpha=0.5, bins=6, density=True) x2.plot(kind='hist', alpha=0.5, bins=6, density=True) return x1, x2 my_function(x1, x2)

Python
View
generate functionThu, 16 Feb 2023

There is a Pandas Series column: 'loan_amnt' 10 78 54 GOOD 64 23 There is a Pandas dataframe that has column 'loan_amnt' with both numbers and letters and words. Write a function that checks the rows in column 'loan_amnt' for numbers and letters. If the string has numbers it is converted to float64 If the string is not a number, it is replaced by np.nan. Example: replace_non_numbers(df, 'loan_amnt') df Result: 10 78 54 NaN 64 23

def replace_non_numbers(df, column): df[column] = pd.to_numeric(df[column], errors = 'coerce')

Python
View
generate functionSat, 28 Jan 2023

There is a Pandas dataframe: date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 2 2021-07-01 Yalta Skyline 15 12 38016 11655 8775 17586 3 2021-07-01 Alushta Breeze 36 25 75475 27972 21060 26443 4 2021-07-01 Alushta Moreon 20 17 51091 15540 11700 23851 5 2021-07-01 Gurzuf Alpina 14 12 38736 10878 8190 19668 Write a function divide_hotels that creates new columns big_hotels, medium_hotels, small_hotels, and adds values according to the following conditions: 1) if df['total_rooms'] > 30, then profit value is substituted into big_hotels column 2) if df['total_rooms'] > 20, then the profit value is substituted in the medium_hotels column 3) if df['total_rooms'] > 10, then profit value is substituted for small_hotels column

import pandas as pd def divide_hotels(df): big_hotels = [] medium_hotels = [] small_hotels = [] for item in df['total_rooms']: if item > 30: big_hotels.append(df['profit']) elif item > 20: medium_hotels.append(df['profit']) elif item > 10: small_hotels.append(df['profit']) df['big_hotels'] = big_hotels df['medium_hotels'] = medium_hotels df['small_hotels'] = small_hotels return df

Python
View
generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) for each NaN line in this column, find a non-NaN line which has similar values in the pulse or respiratory_rate columns (the difference range is 20%) 3) replace NaN with values from the corresponding rows

def my_f(col_name, df): # get the rows for which the column is NaN df = df[ np.isnan(df[col_name]) ] # find the rows that have similar pulse df_pulse = df[ abs( (df.pulse - df.pulse.shift(1)) / df.pulse ) <= 0.2 ] # find the rows that have similar respiratory_rate df_respiratory_rate = df[ abs( (df.respiratory_rate - df.respiratory_rate.shift(1)) / df.respiratory_rate ) <= 0.2 ] # merge the dataframes df_merged = pd.concat( [df_pulse, df_respiratory_rate] ) # get rid of duplicates df_merged = df_merged.drop_duplicates() # return the rows return df_merged

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists and replaces them with replacing_num (but without touching the first one) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace(item, list1, replacing_number): for i in range(0,len(list1)): for j in range(0,len(list1[i])): if list1[i][j] == item: list1[i][j] = replacing_number replace('4004', students, '9090')

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) leave only the rows in the year column with a value higher than 1980 2) calculates the average temperature of the countries (the average of the av_temp column) 3) builds a list of the 20 coldest countries in ascending av_temp order

def filter_coldest(df, year_threshold): return df[df['year'] > year_threshold]\ .groupby('country')['av_temp'].mean()\ .sort_values()[:20] filter_coldest(df, 1980)

Python
View
generate functionFri, 17 Mar 2023

There are two lists with numbers: sync = [85.1 83.8 69.9 82.1 84.4 80.4 78.1 88.4 77. 91.5 76.7 86.6 91.8 73.3 83.9 76.7 85.8 89.6 91.7 87.2 79. 85.3] asyncr = [89.8 81.6 87.4 81. 66.9 72.5 78.4 68.5 78.3 62.6 73.7 77.7 63. 77.5] Write code that will calculate the similarity statistics of these samples, p-value and display them

from scipy import stats import numpy as np sync = [85.1, 83.8, 69.9, 82.1, 84.4, 80.4, 78.1, 88.4, 77., 91.5, 76.7, 86.6, 91.8, 73.3, 83.9, 76.7, 85.8, 89.6, 91.7, 87.2, 79., 85.3] asyncr = [89.8, 81.6, 87.4, 81., 66.9, 72.5, 78.4, 68.5, 78.3, 62.6, 73.7, 77.7, 63., 77.5] stats.ttest_ind(sync, asyncr, equal_var = False) np.var(sync), np.var(asyncr)

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users who gave more than 100 ratings. The lifetime is the difference between the maximum and minimum values of the timestamp column for a given userId.

df = pd.read_csv('ratings.csv') df.head() df.columns df.groupby('userId').size() df.groupby('userId').size().index df.groupby('userId').size().values #df.groupby('userId').size().values > 100 df.groupby('userId').size().values[df.groupby('userId').size().values > 100] df.groupby('userId').size().index[df.groupby('userId').size().values > 100] df[df['userId'] == 1] df[df['userId'] == 1].timestamp df[df['userId'] == 1].timestamp.diff() df[df['userId'] == 1].timestamp.diff().min() df[df['userId'] == 1].timestamp.diff().max() df[df['userId'] == 1].timestamp.diff().max() - df[df['userId'] == 1].timestamp.diff().min() def get_lif

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country 1 1743-12-01 NaN NaN Åland 2 1744-01-01 NaN NaN Åland 3 1744-02-01 NaN NaN Åland 4 1744-03-01 NaN NaN Åland 9 1744-08-01 NaN NaN Åland Create a new column year from date in which you convert all date values to years. Example: 1744-01-01 Date column: 1744

df = pd.DataFrame({'date': ['1743-12-01', '1744-01-01', '1744-02-01', '1744-03-01', '1744-08-01'], 'av_temp': [0, 10, 20, 30, 40], 'deviations': [0, 10, 20, 30, 40], 'country': ['Åland', 'Åland', 'Åland', 'Åland', 'Åland'] })

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

# Find the rows in the dataview where the values in the name column are duplicated. df[df.duplicated(subset='name', keep=False)] # Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order df.sort_values("name").drop_duplicates(subset="name", keep='first')

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type

from collections import Counter def most_incident(df, column): # Count incident_type by route incident_id = Counter(df[column]) # Return key of the most incident_type return incident_id.most_common(1)[0][0] most_incident(df_new, 'route')

Python
View
generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return The result of the function should be three values: 12, 28, 21

# Import necessary libraries from math import sqrt, erf def interval(n, mean, sig, conf): z = erf(conf + 0.5) h = z * sig / sqrt(n) return int(h) # The result of the function should be three values: 12, 28, 21 interval(100, 100, 10, 0.95) interval(100, 100, 10, 0.99) interval(100, 100, 10, 0.995)

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type. Sort them in descending order by incident_type column

def get_most_incidents_route(df): # count incident_type by the route df_count = df.groupby('route').count() # sort the df_count by incident_type and get the first row df_count.sort_values(by='incident_type', inplace=True, ascending=False) return df_count.iloc[0] get_most_incidents_route(df_tfl)

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 in this dataframe the names of films are not unique, because the same film may have been given different ratings Your task is to write a function that calculates the average rating of each movie and removes non-unique strings

import pandas as pd data = pd.DataFrame({ 'id': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'title': ['Pulp Fiction (1994)', 'Three Colors: Red (Trois couleurs: Rouge) (1994)', 'Three Colors: Blue (Trois couleurs: Bleu) (1993)', 'Underground (1995)', 'Singin\' in the Rain (1952)', 'Dirty Dancing (1987)', 'Delicatessen (1991)', 'Ran (1985)', 'Seventh Seal, The (Sjunde inseglet, Det) (1957)', 'Bridge on the River Kwai, The (1957)'], 'rating': [5.0, 3.5, 5.0, 5.0, 3.5, 4.0, 3.5, 3.5, 5.0, 4.0] }) def film_rating(df): return df.groupby('title')['rating'].mean().reset_index().

Python
View
generate functionSun, 18 Dec 2022

Rewrite the code with a command that removes unnecessary quotation marks (' ') in the dictionary: Here's the problem: Output: {' "user_id"': ' "category"}', ' "1840e0b9d4"': ' "Products"}', How it should be: Output: {"user_id": "category"}, "1840e0b9d4": "Products"}, The code: f = open('purchase_log.txt', encoding='utf-8') import re import json purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

f = open('purchase_log.txt', encoding='utf-8') purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1].strip('"') values = line[3].strip('"') purchases[keys] = values

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Add an "av_ltv" column to the dataframe, in which you calculate the average lifetime of the users. Lifetime refers to the difference between the maximum and minimum values of the timestamp column for a given userId value.

df.groupby('userId')['timestamp'].agg(['max', 'min']) df['av_ltv'] = df.groupby('userId')['timestamp'].agg(['max', 'min'])['max'] - df.groupby('userId')['timestamp'].agg(['max', 'min'])['min']

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

data.name.duplicated() data[data.name.duplicated()] data[data.name.duplicated()].sort_values(by='name')

Python
View
generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy Add it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy

Python
View
generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def y_axis(question6): question6 = df[question6].value_counts() label = question6.index counts = question6.values return counts def x_axis(question6): question6 = df[question6].value_counts() label = question6.index counts = question6.values return label def plot(question6): fig = px.bar(x=y_axis(question6), y=x_axis(question6), orientation='h') fig.update_layout(title_text=question6) fig.show() plot('What is the most preferred working environment for you.')

Python
View
generate functionThu, 22 Dec 2022

Return a new array consisting of elements which are multiple of their own index in input array (length > 1). Some cases: [22, -6, 32, 82, 9, 25] => [-6, 32, 25] [68, -1, 1, -7, 10, 10] => [-1, 10] [-56,-85,72,-26,-14,76,-27,72,35,-21,-67,87,0,21,59,27,-92,68] => [-85, 72, 0, 68] [22] should equal [-6, 32, 25] [68] should equal [-1, 10] [11] should equal [-11] [-56] should equal [-85, 72, 0, 68] [28] should equal [38, -44, -99] [-1] should equal [-49, 8, -60, 35]

def multiple_of_index(arr): #your code here return []

Python
View
generate functionWed, 01 Feb 2023

There is a function for Pandas dataframe: dfp = df.copy() dfp.sort_values(by='performer', inplace=True) def group_by_performer(data): dfp['hits'] = dfp.groupby('performer')['song'].transform(lambda x: ', '.join(x.unique())) dfp.drop_duplicates(subset='performer', inplace=True) dfp.reset_index(inplace=True) return dfp group_by_performer(dfp) Add to it the function of sum of all values in the time_on_chart and consecutive_weeks columns if the values in the performer column are the same. The total should be displayed in the same columns (time_on_chart and consecutive_weeks)

dfp = df.copy() dfp.sort_values(by='performer', inplace=True) def group_by_performer(data): dfp['hits'] = dfp.groupby('performer')['song'].transform(lambda x: ', '.join(x.unique())) dfp['time_on_chart'] = dfp.groupby('performer')['time_on_chart'].transform(lambda x: x.sum()) dfp['consecutive_weeks'] = dfp.groupby('performer')['consecutive_weeks'].transform(lambda x: x.sum()) dfp.drop_duplicates(subset='performer', inplace=True) dfp.reset_index(inplace=True) return dfp group_by_performer(dfp)

Python
View
generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders(dfp, year): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy.loc[year] year_leaders(dfp, '1956')

Python
View
generate functionWed, 18 Jan 2023

Fix this function so that when it finds a value in a Pandas dataframe string, it returns its key: def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data.values(): if value in row: return key return 'undefined'

def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data[key]: if value in row: return key return 'undefined'

Python
View
generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a scaled_data dataframe with the values from "sl" in the first column and "sw" in the second column. The values in the lists are contained in the following format: [[-0.90068117] [-1.14301691] [-1.38535265] [-1.50652052] [-1.02184904] [-0.53717756] [-1.50652052] [-1.02184904] [-1.74885626] [-1.14301691]]

def create_dataframe(sl, sw): scaled_data = pd.DataFrame({ "sl": sl, "sw": sw }) return scaled_data create_dataframe(sl, sw)

Python
View
generate functionWed, 08 Feb 2023

def get_needed_posts(query): for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True) Modify the Python function above so that if it finds a repeated title or link, it will not add it to the return

def get_needed_posts(query): site = pd.DataFrame(columns=['date', 'title', 'link']) articles = BeautifulSoup(requests.get(f'https://habr.com/ru/search/?target_type=posts&q={query}').text, 'lxml').find('div', class_='content-list').find_all('article', class_='post post_preview') for article in articles: try: title = article.find('h2', class_='post__title').text date = article.find('span', class_='post__time').text.strip() link = article.find('h2', class_='post__title').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True)

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have rated 100 or more in the rating column

# import data import pandas as pd data = pd.read_csv("ml-100k/u.data", sep="\t", names=["user_id", "item_id", "rating", "timestamp"]) data.head() # create a new dataframe that counts the number of ratings per user user_data = data.groupby('user_id').count() user_data.head() # create a new column that is true if a user has rated 100 or more user_data['hasrated'] = user_data['rating'] >= 100 user_data.head() # create a new dataframe from the original data, where user_id is in the list # user_id's that have more than 100 ratings rating_data = data[data['user_id'].isin(user_data[user_data['hasrated'] == True].index)] rating_data.head()

Python
View
generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Add an "av_ltv" column to the dataframe, in which you calculate the average lifetime of the users. Lifetime refers to the difference between the maximum and minimum values of the timestamp column for a given userId value.

import pandas as pd df = pd.read_csv('./movies.csv') df['av_ltv'] = df.groupby('userId')['timestamp'].transform(lambda x: x.max() - x.min()) df

Python
View
generate functionTue, 24 Jan 2023

there are two Pandas dataframes: rzd client_id rzd_revenue 0 111 1093 1 112 2810 2 113 10283 3 114 5774 4 115 981 auto client_id auto_revenue 0 113 57483 1 114 83 2 115 912 3 116 4834 4 117 98 Create a new dataframe joined and merge these 2 dataframes inside it

import pandas as pd rzd = pd.DataFrame( { 'client_id': [111, 112, 113, 114, 115], 'rzd_revenue': [1093, 2810, 10283, 5774, 981] } ) auto = pd.DataFrame( { 'client_id': [113, 114, 115, 116, 117], 'auto_revenue': [57483, 83, 912, 4834, 98] } ) def merge_df(rzd, auto): joined = rzd.merge(auto, how='outer', on='client_id') return joined

Python
View
generate functionSat, 25 Feb 2023

This code gives an error "expected string or bytes-like object" Refactor it: import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

Python
View
generate functionWed, 21 Dec 2022

Change this function so that the employee's awards are also taken into account (add this variable to the class) and the employee's promotion condition is taken into account by the formula if (self.seniority)+(self.intlawards*2) % 7 == 0: self.grade_up() class Developer(Employee): def __init__(self, name, seniority): super().__init__(name, seniority) def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all of the developers pass the accreditation self.seniority += 1 # condition of promoting an employee from the presentation if self.seniority % 5 == 0: self.grade_up() # publication of the results return self.publish_grade()

def check_if_it_is_time_for_upgrade(self): if (self.seniority)+(self.intlawards*2) % 7 == 0: self.grade_up() return self.publish_grade()

Python
View
generate functionMon, 13 Feb 2023

Modify the function below so that it rounds to the nearest integer, but not less than 1 and not more than 4: def value_filler(row): filler = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median')) if filler < 1: return 1 elif filler > 4: return 4 else: return df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median')) df2['respiratory_rate'] = df2['respiratory_rate'].apply(value_filler)

def value_filler(row): filler = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median')) if filler < 1: return 1 elif filler > 4: return 4 else: return round(df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))) df2['respiratory_rate'] = df2['respiratory_rate'].apply(value_filler)

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) calculates the average temperature in countries (average av_temp by decade column grouped by country) 2) build a list of the 20 coldest countries in ascending av_temp order

import pandas import numpy as np df = pandas.DataFrame({'date': ['2013-05-01', '2013-06-01', '2013-07-01', '2013-08-01'], 'av_temp': [19.059, 17.613, 17.0, 19.759], 'deviations': [1.022, 0.473, 0.453, 0.717], 'country': ['Zimbabwe', 'Zimbabwe', 'Zimbabwe', 'Zimbabwe'], 'year': [2013, 2013, 2013, 2013], 'decade': ['2010-2020', '2010-2020', '2010-2020', '2010-2020']}) df

Python
View
generate functionWed, 01 Feb 2023

There is a function for Pandas dataframe: dfp = df.copy() dfp.sort_values(by='performer', inplace=True) def group_by_performer(data): dfp['hits'] = dfp.groupby('performer')['song'].transform(lambda x: ', '.join(x.unique())) dfp['time_on_chart'] = dfp.groupby('performer')['time_on_chart'].transform(lambda x: x.sum()) dfp['consecutive_weeks'] = dfp.groupby('performer')['consecutive_weeks'].transform(lambda x: x.sum()) dfp.drop_duplicates(subset='performer', inplace=True) dfp.reset_index(inplace=True) return dfp group_by_performer(dfp) Add a function to it to convert the values in the chart_debut column from the format '1966-06-25' to the format 1966, that is, just the year

def get_year(data): data['chart_debut'] = pd.to_datetime(data['chart_debut'], format='%Y-%m-%d').dt.year return data dfp = get_year(dfp)

Python
View
generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) calculate the correlation of this column with other columns: pulse, respiratory_rate, extremities_temp, pain_outcome 3) fill all NaNs in the 'rectal_temp' column with the average values multiplied by the correlation weight

def fill_rectal_temp(df, col_name = 'rectal_temp'): corr = df.corr()[col_name][1:] corr_dict = corr.to_dict() corr_dict.pop('surgery') corr_dict.pop('outcome') corr_dict.pop('age') corr_dict.pop('pain') mean_col1 = df[col_name].mean() df_copy = df.copy() df_copy[col_name] = df_copy[col_name].fillna(0) for key, value in corr_dict.items(): col_mean = df[key].mean() df_copy[key] = df_copy[key].fillna(mean_col1*value) df_copy[col_name] = df_copy[col_name] + value*df_copy[key] return df_copy fill_rectal_temp(df2)

Python
View
generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): return water * (1.1 ** (clothes - load)) how_much_water(5, 10, 14)

Python
View
generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value of each column

import matplotlib.pyplot as plt from pandas.plotting import table fig, ax = plt.subplots(figsize=(10, 2)) # set size frame ax.xaxis.set_visible(False) # hide the x axis ax.yaxis.set_visible(False) # hide the y axis ax.set_frame_on(False) # no visible frame, uncomment if size is ok tabla = table(ax, df, loc='upper right', colWidths=[0.17]*len(df.columns)) # where df is your data frame tabla.auto_set_font_size(False) # Activate set fontsize manually tabla.set_fontsize(12) # if ++fontsize is necessary ++colWidths tabla.scale(1.2, 1.2) # change size table plt.savefig('table.png', transparent=True)

Python
View
generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def search_data(data, search_word): for i in range(len(data)): if search_word in data.loc[i, 'Class 1'] or search_word in data.loc[i, 'Class 2']: data = data.drop([i]) return data search_data(grass, 'Rock')

Python
View
generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations.

from scipy import stats import pandas as pd def check_for_significance(a, b): t, p = stats.ttest_ind(a, b) df = pd.DataFrame({"t" : t, "p-value" : p}, index = [0]) return df

Python
View
generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): if clothes <= load: return water else: return water * 1.1 ** (clothes - load)

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists. If a match is found, only the 2nd value must be replaced by replacing_num stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace(stnums, students, replacing_num): for st_num in stnums: for student in students: if st_num in student: student[1] = replacing_num return students

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated

data[data.name.duplicated()]

Python
View
generate functionWed, 22 Feb 2023

Modify this code to make a horizontal bar graph instead of a vertical (plotly.express library) question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = go.Figure(data=[go.Bar(x=label, y=counts)]) fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

df['question6'].value_counts().iplot(kind='bar', yTitle='Count', linecolor='black', title = 'Какая рабочая среда для вас наиболее предпочтительна?')

Python
View
generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that will create a new dataframe hot_years and leave in it only those lines where av_temp > 12

df_ru = pd.read_csv('https://raw.githubusercontent.com/OSU-geohackweek2020/tutorials/master/00_data/temperature/df_ru.csv') df_ru.head()

Python
View
generate functionThu, 02 Feb 2023

There is a Pandas dataframe: year date route operator group_name bus_garage bus_park injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Write a grouped_operators function that groups all values by operator column and sums all dates for each unique operator value

import pandas as pd data = {'year':[2015,2015,2015], 'date':['01.01.2015','01.01.2015','01.01.2015'], 'route':[1,4,5], 'operator':['London General','Metroline','East London'], 'group_name':['Go-Ahead','Metroline','Stagecoach'], 'bus_garage':['Southwark','Islington','Havering'], 'bus_park':['Garage Not Available','Garage Not Available','Garage Not Available'], 'injury_result':['Injuries treated on scene','Injuries treated on scene','Taken to Hospital – Reported Serious Injury or...'], 'incident_type':['Onboard Injuries','Onboard Injuries','Onboard Injuries'], 'victim_category':['Passenger','Passenger','Passenger'], 'victim_sex':['Male','Male','Male'], 'victim_age':['Child','Unknown','Elderly']

Python
View
generate functionWed, 21 Dec 2022

Change the class so that self.seniority defaults to 1 and self.intlawards = 2 class Designer(Employee): def __init__(self, name, seniority, awards): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority) + (self.intlawards*2) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

function that: takes two numbers as input and returns True if the first is bigger than the second and False otherwise.

Python
View
generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions if key == 'salad': grams = 'quantity' * portions print(grams) cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

Python
View
generate functionThu, 19 Jan 2023

There is a Pandas dataframe: place decade perc_of_5star 0 1 2010-2020 2.300 1 2 1900-1910 1.379 2 3 1970-1980 1.179 3 4 2000-2010 1.176 4 5 1960-1970 1.133 build a horizontal barchart with perc_of_5star columns in descending order and decade values on the y-axis

def plot_city_reviews(df): fig, ax = plt.subplots() df.sort_values('perc_of_5star', ascending=False)[:10].plot.barh(x='place', y='perc_of_5star', figsize=(10,6), ax=ax) _ = ax.set(ylabel='Decade', xlabel='% of 5-star reviews')

Python
View
generate functionWed, 08 Mar 2023

Upgrade the code so that a roll of 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090 will drop 1 dollar for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1

def ends77(num): return num % 100 == 77 def ends7(num): return num % 10 == 7 def ends00(num): return num % 100 == 0 def ends0(num): return num % 10 == 0

Python
View
generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that converts dataframe columns to rows

def row_from_columns(df): return df.melt(id_vars=['spi_rank', 'country', 'spi_score']) row_from_columns(df)

Python
View
generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def RemoveRockGrass(dataset): for index, row in dataset.iterrows(): if (row['Class 1'] == 'Rock') or (row['Class 2'] == 'Rock'): dataset = dataset.drop(index) return dataset

Python
View
generate functionTue, 13 Dec 2022

create a password verification function in one line. The verification conditions are: 1) the length should be bigger than 6; 2) should contain at least one digit. Input: A string. Output: A bool. Examples: assert is_acceptable_password("short") == False assert is_acceptable_password("muchlonger") == False assert is_acceptable_password("ashort") == False assert is_acceptable_password("muchlonger5") == True

def is_acceptable_password(password: str) -> bool: return len(password)>6 and any(char.isdigit() for char in password) is_acceptable_password("muchlonger5")

Python
View
generate functionSun, 18 Dec 2022

Replace re.split with regular split to get the same result (you can use the json library and the loads function) The code: f = open('purchase_log.txt', encoding='utf-8') import re purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

f = open('purchase_log.txt', encoding='utf-8') import re purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

Python
View
generate functionMon, 26 Dec 2022

There are 2 functions. One looks for duplicate values and makes a dictionary. The second changes the duplicates to the desired value. The first function (find_duplicates) works correctly. But the second function (change_duplicates) is incomplete. Finish the code so that the duplicate student number changes to the one in brackets in the change_duplicates function. def find_duplicates(lst): stnums = [] cnt = {} for el in lst: stnums.append(el[0]) for i in stnums: if i in cnt: cnt[i] += 1 else: cnt[i] = 1 return cnt def change_duplicates(lst, replacing_num): stnums = [] duplicates = find_duplicates(lst) for k, v in duplicates.items(): if v > 1: stnums = [k] for student in lst: if stnums in lst: #need right code#

#Python 3.7.3 #https://stackoverflow.com/questions/57774616/find-duplicates-in-list-of-tuples-in-python def find_duplicates(lst): stnums = [] cnt = {} for el in lst: stnums.append(el[0]) for i in stnums: if i in cnt: cnt[i] += 1 else: cnt[i] = 1 return cnt def change_duplicates(lst, replacing_num): for stnums in lst: if stnums[0] == replacing_num: stnums[0] = replacing_num students = [["1", "John", "Biology", "A"], ["2", "Mary", "Math", "C"], ["3", "Alex", "Computer Science", "B"], ["3", "Alex", "Computer Science", "B"]] print(students)

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of "date" string values for all "operator"

df_new.groupby('operator').date.mean()

Python
View
generate functionSat, 28 Jan 2023

There is a Pandas dataframe: hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) Construct a bar graph with 'hotel' values in the x axis and change values in the 'date' column and 'av_revenue' values in the y axis

hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) hotels_rev %matplotlib inline hotels_rev['hotel'].value_counts() hotels_rev.plot('hotel', 'av_revenue', kind = 'bar')

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type

import pandas as pd df = pd.read_csv('../data/bus_data.csv') df = df[df['incident_type']=='Onboard Injuries'][['route', 'incident_type']] df.groupby('route')['incident_type'].count().sort_values(ascending=False)

Python
View
generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] For index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['message'], 'actual': df.iloc[index]['label'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it

misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'Message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python
View
generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that finds the mean values of each column and transposes them to rows

import pandas as pd df = pd.read_csv('SDG_data.csv') def mean_row(data): df_2 = data[["basic_human_needs", "foundations_of_wellbeing", "opportunity", "basic_nutri_med_care", "water_sanitation", "shelter", "personal_safety", "access_to_knowledge", "access_to_communications", "health_wellness", "environmental_quality", "personal_rights", "personal_freedom", "inclusiveness", "access_to_advanced_education"]] mean_df = pd.DataFrame(df_2.mean(axis=0)).T return mean_df

Python
View
generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value

import pandas as pd import matplotlib.pyplot as plt data = {'water': [1, 2, 3, 4, 2, 4, 2, 4, 5, 2, 3, 4, 2, 1, 3, 4, 3, 2, 5, 1], 'nutri': [1, 2, 4, 6, 5, 6, 7, 5, 4, 5, 6, 7, 4, 3, 5, 5, 6, 5, 4, 3], 'mineral': [2, 1, 1, 3, 2, 4, 2, 4, 5, 4, 3, 2, 3, 2, 3, 1, 3, 4, 5, 1]} df = pd.DataFrame(data) df.plot(kind='bar', stacked=True)

Python
View
generate functionMon, 26 Dec 2022

There are two lists. The first (stnums) contains rows that are considered duplicates. The second list is a list of students. Write a function that replaces the values in the second list that match the first (but only the 2nd, 3rd value, and so on, not the first one). stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def replace(stnums, students): # you code here

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

hotel = pd.read_csv('hotel_bookings.csv', sep=';') hotel.duplicated(['name']) hotel[hotel.duplicated(['name'], keep = False)].sort_values(by = 'name')

Python
View
generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists and replaces them with replacing_num (but without touching the first one) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace_nums(stnums, students, replacing_num): for i in range(len(students)): for j in range(len(students[i])): if students[i][j] in stnums: students[i][j] = replacing_num return students replace_nums(stnums, students, replacing_num)

Python
View
generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

# Challenge 1 import pandas as pd def duplicates(df): return df[df.name.duplicated()].sort_values(by='name', ascending=True).reset_index(drop=True) df = pd.DataFrame({'name': ['Ksenia Rodionova', 'Artur Petrov', 'Ivan Sidorov', 'Ksenia Rodionova'], 'date': ['2021-07-01', '2021-07-02', '2021-07-03', '2021-07-04']}) duplicates(df)

Python
View
generate functionFri, 03 Mar 2023

Correct the code of these plotly plots so that the y-axis is a logarithmic scale fig = go.Figure() fig.add_trace(go.Bar( x=data['Country'], y=data['Total Cases'], name='Заболевших', marker_color='indianred')) fig.add_trace(go.Bar( x=data['Country'], y=data['Total Deaths'], name='Умерших', marker_color='lightsalmon')) fig.update_layout(barmode='group', xaxis_tickangle=-45) fig.show()

fig = go.Figure() fig.add_trace(go.Bar( x=data['Country'], y=data['Total Cases'], name='Заболевших', marker_color='indianred')) fig.add_trace(go.Bar( x=data['Country'], y=data['Total Deaths'], name='Умерших', marker_color='lightsalmon')) fig.update_layout(barmode='group', xaxis_tickangle=-45, yaxis_type="log") fig.show()

Python
View
generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that converts dataframe columns to strings

def convert_to_str(dataframe): rank = str(dataframe['spi_rank']) country = dataframe['country'] return rank + ' ' + country convert_to_str(df.iloc[0])

Python
View
generate functionSat, 28 Jan 2023

There is a Pandas dataframe: hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) Construct a bar graph with 'hotel' values in the x axis and change values in the 'date' column and 'av_revenue' values in the y axis

def bar_graph(df): data = df data1 = df[['date', 'revenue']] data2 = data.groupby(['date']).mean() data2['date'] = data2.index data3 = data2.merge(data1, on = 'date') data3 = data3.groupby('hotel', as_index=False).diff() data3['in_percent'] = data3.apply(lambda row: round((row['revenue_x'] / row['revenue_y'] - 1) * 100, 2), axis = 1) data3.head() plt.figure(figsize=(12, 8)) plt.bar(data3['hotel'], data3['in_percent']) plt.xlabel('hotel') plt.ylabel('change in percent') plt.title('change in %') plt.show()

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Plot a bar chart with number_of_hits on the x-axis and performer on the y-axis.

import pandas as pd import matplotlib.pyplot as plt def plot_top_performers(dataframe, column, number_of_hits): return dataframe.nlargest(number_of_hits, column).plot.barh(x='performer', y='num_of_hits', title='Top 20 Performers') data = pd.DataFrame({'performer' : ['Glee Cast', 'Taylor Swift', 'Drake', 'YoungBoy Never Broke Again', 'Aretha Franklin', 'The Beatles'], 'num_of_hits' : [191, 166, 125, 75, 66, 66]}) plot_top_performers(data, 'num_of_hits', 6)

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country 1 1743-12-01 NaN NaN Åland 2 1744-01-01 NaN NaN Åland 3 1744-02-01 NaN NaN Åland 4 1744-03-01 NaN NaN Åland 9 1744-08-01 NaN NaN Åland Create a new column year from date in which you pull out the first 4 digits of the year. Example: 1744-01-01 Date column: 1744 Example: 1744-01-01 Date column: 1744

def yearExtractor(date): year = int(date[:4]) return year df['year'] = df['Date'].apply(yearExtractor)

Python
View
generate functionWed, 08 Feb 2023

def get_needed_posts(query): site = pd.DataFrame() for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles-list__item') for article in articles: try: title = article.find('h2', class_='tm-article-snippet__title').text date = article.find('span', class_='tm-article-snippet__datetime-published').text.strip() link = article.find('h2', class_='tm-article-snippet__title').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True) Modify the function above so that if it finds a repeated title or link, it will not add it to the return

def get_needed_posts(query): site = pd.DataFrame() links = [] titles = [] for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles-list__item') for article in articles: try: title = article.find('h2', class_='tm-article-snippet__title').text date = article.find('span', class_='tm-article-snippet__datetime-published').text.strip() link = article.find('h2', class_='tm-article-snippet__title').find('a').get('href') if link not in links and title not in titles: titles.append(title) links.append(link) row = {'date': date, '

Python
View
generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns hapscore = pd.DataFrame({'2015': [df15['Happiness Score'].mean()], '2016': [df16['Happiness Score'].mean()], '2017': [df17['Happiness.Score'].mean()], '2018': [df18['Score'].mean()], '2019': [df19['Score'].mean()]}) 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

def graph(x, y): plt.plot(x, y) plt.show() graph(hapscore.columns, hapscore.loc[0])

Python
View
generate functionSun, 18 Dec 2022

Rewrite the code with a command that removes unnecessary quotation marks (' ') in the dictionary: Here's the problem: Output: {' "user_id"': ' "category"}', ' "1840e0b9d4"': ' "Products"}', How it should be: Output: {"user_id": "category"}, "1840e0b9d4": "Products"}, The code: f = open('purchase_log.txt', encoding='utf-8') import re import json purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

def remove_quotes(s): if s[0] in '\'"' and s[-1] in '\'"': s = s[1:-1] return s keys = remove_quotes(line[1]) values = remove_quotes(line[3])

Python
View
generate functionFri, 17 Mar 2023

Write a function that will determine the size of the necessary sample for the study if the following parameters are known: - error (delta); - variance (sigsqr); - Confidence level (conf). Round up the answer to a whole number.

# function that determines the size of the necessary sample for the study def sample_size(delta, sigsqr, conf): # confidence level alpha = 1 - conf # standard error stderr = math.sqrt(sigsqr) # margin of error moe = stderr * norm.ppf(1 - alpha / 2) # sample size n = (norm.ppf(1 - alpha / 2) * stderr / delta) ** 2 # rounding up n n = math.ceil(n) return n # calling function sample_size(0.1, 1, 0.95)

Python
View
generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that will create a new dataframe hot_years and leave in it only those lines where av_temp > 12

def filter_hot_years(df, hot_temp = 12, country = 'Russia'): df_hot = df[(df['av_temp'] > hot_temp) & (df['country'] == country)] return df_hot df_hot_years = filter_hot_years(df_ru, hot_temp = 18)

Python
View
generate functionThu, 09 Feb 2023

There is 'actors' Pandas dataframe: actorid name date_of_birth birth_city birth_country height_inches biography gender ethnicity networth 1002 1155 Jackie Chan 1954-04-07 Victoria Peak Hong Kong 68.0 Hong Kong's cheeky, lovable and best known fil... Male NaN 400000000.0 1387 69 Keanu Reeves 1964-09-02 Beirut Lebanon 73.0 Keanu Charles Reeves, whose first name means "... Male Lebanese 360000000.0 2252 141 Sean Connery 1930-08-25 Edinburgh UK 74.0 The tall, handsome and muscular Scottish actor... Male White 350000000.0 291 6 Bruce Willis 1955-03-19 Idar-Oberstein West Germany 72.0 Actor and musician Bruce Willis is well known ... Male White 250000000.0 Write a function that sorts all rows in the 'networth' column in descending order and creates a new dataframe with only the first 393 rows remaining

def sort_and_return(df): df = df.sort_values('networth', ascending=False) df = df[0:393] return df

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times Write a function that groups unique movie titles in one column and displays their average rating in the second column

def group_by_title(df): pass

Python
View
generate functionMon, 20 Feb 2023

There is a Pandas code: df2.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df2.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') Write a function that combines these two actions into one and returns the changed dataframe

def highlight_max_min(df): df.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') return df.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;')

Python
View
generate functionThu, 22 Dec 2022

Return a new array consisting of elements which are multiple of their own index in input array (length > 1). Some cases: [22, -6, 32, 82, 9, 25] => [-6, 32, 25] [68, -1, 1, -7, 10, 10] => [-1, 10] [-56,-85,72,-26,-14,76,-27,72,35,-21,-67,87,0,21,59,27,-92,68] => [-85, 72, 0, 68] [22] should equal [-6, 32, 25] [68] should equal [-1, 10] [11] should equal [-11] [-56] should equal [-85, 72, 0, 68] [28] should equal [38, -44, -99] [-1] should equal [-49, 8, -60, 35]

def multiple_of_index(arr): new_arr = [] for i in range(1, len(arr)): if arr[i] % i == 0: new_arr.append(arr[i]) return new_arr

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times Write a function that groups unique movie titles in one column and displays their average rating in the second column

def group_avg_ratings(df): #your code here df = pd.DataFrame({'id':[0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9], 'title':['Pulp Fiction (1994)','Three Colors: Red (Trois couleurs: Rouge) (1994)','Three Colors: Blue (Trois couleurs: Bleu) (1993)','Underground (1995)','Singin\' in the Rain (1952)','Dirty Dancing (1987)','Delicatessen (1991)','Ran (1985)','Seventh Seal, The (Sjunde inseglet, Det) (1957)','Bridge on the River Kwai, The (1957)','Pulp Fiction (1994)','Three Colors: Red (Trois couleurs: Rouge) (1994)','Three Colors: Blue (Trois couleurs: Bleu) (1993)','Underground (1995)','Singin\' in the Rain (1952)','

Python
View
generate functionTue, 21 Mar 2023

There are 3 lists: water, nutri and mineral. They differ in length: print(len(water)) - 20 print(len(nutri)) -21 print(len(mineral)) -21 add them to df Pandas dataframe in the following format: index treatments value 0 0 water 25 1 1 water 30 2 2 water 28 3 3 nutri 36 4 4 nutri 29 5 0 nutri 45 6 1 mineral 55 7 2 mineral 29 8 3 mineral 56

def make_df(water, nutri, mineral): water = pd.DataFrame({'treatments': 'water', 'value': water}, index=list(range(len(water)))) nutri = pd.DataFrame({'treatments': 'nutri', 'value': nutri}, index=list(range(len(nutri)))) mineral = pd.DataFrame({'treatments': 'mineral', 'value': mineral}, index=list(range(len(mineral)))) df = pd.concat([water, nutri, mineral]) return df make_df(water, nutri, mineral)

Python
View
generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

def misclassified_messages(y_test, predicted): misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) return misclassification_df

Python
View
generate functionSat, 04 Mar 2023

Correct the code to correctly output the dataframe with this data: gdpdiff = pd.DataFrame({'Top1': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) gdpdiff

def gdp(df19): Top1=df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1] Top20=df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20] dataset = pd.DataFrame({'Top1': Top1, 'Top20': Top20}) return dataset gdp(df19)

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Pandas dataframe is available: song artist chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding (unique) values to that column are written through the "," sign into a new hits column. All values of both time_on_chart and consecutive_weeks columns must be summed if the performer value matches.

def get_songs_from_artists(df): # do something return df

Python
View
generate functionSat, 25 Feb 2023

There is a Pandas dataframe: loan_amnt int_rate 0 5000 10.65% 1 2500 15.27% 2 2400 15.96% 3 10000 13.49% 4 3000 12.69% 5 5000 7.90% 6 7000 15.96% You need to split the values of the int_rate column into categories. Important: The int_rate column is in str data format. Use regex to remove the '%' signs at the end of strings. Then convert the values to float format. Then write a function that creates the rate_group column and adds values to it as follows: 1) if int_rate > 15.00%, rate_group should be '>15' 2) if int_rate is between 10.00% and 15.00%, then rate_group should be '10-15'

import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

Python
View
generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

df_year_leaders = year_leaders(df_performers)

Python
View
generate functionWed, 18 Jan 2023

Fix this function so that when it finds a value in a Pandas dataframe string, it returns its key: def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data.values(): if value in row: return key return 'undefined'

def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): if row in geo_data[key]: return key return 'undefined'

Python
View
generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value of each column

import pandas as pd import matplotlib.pyplot as plt data = {'water': [1, 2, 3, 4, 2, 4, 2, 4, 5, 2, 3, 4, 2, 1, 3, 4, 3, 2, 5, 1], 'nutri': [1, 2, 4, 6, 5, 6, 7, 5, 4, 5, 6, 7, 4, 3, 5, 5, 6, 5, 4, 3], 'mineral': [2, 1, 1, 3, 2, 4, 2, 4, 5, 4, 3, 2, 3, 2, 3, 1, 3, 4, 5, 1]} data = pd.DataFrame(data, index = range(20)) plt.figure(figsize = (12, 4)) plt.bar(data.index, data['water'], color = 'r', label = 'water') plt.bar(data.index, data['nutri'], color = 'y', label = 'nutri', bottom = data

Python
View
generate functionMon, 20 Feb 2023

There is a Pandas code: df2.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df2.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') Write a function that combines these two actions into one and returns the changed dataframe

def highlight_max_and_min(df): df.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') return df

Python
View
generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp)

Python
View
generate functionThu, 20 Jul 2023

You apply a convolution module to the data: y=convolve(x,kernel,bias), where x is the input sequence, kernel is the convolution kernel, bias are the shift parameters for each output channel. Write a function calculate_conv_x_grad that finds the value of the derivative of the result given the input diff(y)/diff(x). import sys import ast import numpy as np def parse_array(s): return np.array(ast.literal_eval(s)) def read_array(): return parse_array(sys.stdin.readline()) def write_array(arr): print(repr(arr. tolist())) def calculate_conv_x_grad(x, y, kernel, bias): """ x - InLen x InChannels y - OutLen x OutChannels kernel - OutChannels x InChannels x KernelSize bias - OutChannels returns InLen x InChannels """ x = read_array() y = read_array() kernel = read_array() bias = read_array() result = calculate_conv_x_grad(x, y, kernel, bias) write_array(result) Use the following values for testing and debugging: sample input: [[0.5031766517322117, 0.30744410216949514], [0.04690208449415345, 0.322727131626243], [0.1388690574185909, 0.48576543724022325 ], [0.5260018011862109, 0.5859221562109312] 0.8974007607375208, 0.5713329992292489], [0.378989716528242, 0.49787928388753266]] [[1.5157583762374225, 0.9460413662192456, 0.9802340338281511], [1.5728362445918327, 0.996409724139607, 1.2530013664472253], [1 .9068174476481374, 1.430592927945995, 1.6704630594015581], [2.189768979209843, 2.3149543871163503, 2.1601629609824995], [2.8353 457102707083, 1.7422359297539565, 1.816707087141475], [2.0532913525958474, 1.9924093441385802, 2.3069493556139014]] [[[0.8077620147648772, 0.006392942850116379, 0.6080212915877307], [0.6288229869798402, 0.6410664904844843, 0.75419330562945] ], [[0.5355186530459589, 0.9211024178840701, 0.27725553497982014], [0.4507098181629161, 0.081570594016668, 0.8234980185346139]], [[0.0325944131753374, 0.7744753133142763, 0.05946983249285043], [0.7059580971549311, 0.7969953841197822 , 0.5257810951530107]]] [0.2579976950685653, 0.029957050945287222, 0.18958928880952108] sample output: [[1.3758750809861735, 1.7854909022976875], [3.0778457550346365, 3.305123370918622], [4.022592414095037, 5.4085957902356965], [4. 022592414095037, 5.4085957902356965], [4.022592414095037, 5.4085957902356965], [4.022592414095037, 5.4085957902356965], [2.646717 333108864, 3.623104887938009], [0.9447466590604012, 2.1034724193170744]]

import sys import ast import numpy as np def parse_array(s): return np.array(ast.literal_eval(s)) def read_array(): return parse_array(sys.stdin.readline()) def write_array(arr): print(repr(arr. tolist())) def calculate_conv_x_grad(x, y, kernel, bias): """ x - InLen x InChannels y - OutLen x OutChannels kernel - OutChannels x InChannels x KernelSize bias - OutChannels returns InLen x InChannels """ x = read_array() y = read_array() kernel = read_array() bias = read_array() result = calculate_conv_x_grad(x, y, kernel, bias) write_array(result)

Python
View
generate functionMon, 26 Dec 2022

There are 2 functions. One looks for duplicate values and makes a dictionary. The second changes the duplicates to the desired value. The first function (find_duplicates) works correctly. But the second function (change_duplicates) is incomplete. Finish the code so that the duplicate student number changes to the one in brackets in the change_duplicates function. def find_duplicates(lst): stnums = [] cnt = {} for el in lst: stnums.append(el[0]) for i in stnums: if i in cnt: cnt[i] += 1 else: cnt[i] = 1 return cnt def change_duplicates(lst, replacing_num): stnums = [] duplicates = find_duplicates(lst) for k, v in duplicates.items(): if v > 1: stnums = [k] for student in lst: if stnums in lst: #need right code#

# Python ## Python Basics ### Basic Syntax Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms. The official Python website is www.python.org. There you will find the latest Python documentation. There are also mailing lists, newsgroups, and a number of other resources available at the site. ### Running Python The Python interpreter is usually installed as /usr/local/bin/python3.7 on those machines where it is available; putting /usr/local/bin in your Unix shell’s search path makes it possible to start it by typing the command:

Python
View
generate functionTue, 21 Mar 2023

There are 3 lists: water, nutri and mineral. They differ in length: print(len(water)) - 20 print(len(nutri)) -21 print(len(mineral)) -21 add them to df Pandas dataframe in the following format: index treatments value 0 0 water 25 1 1 water 30 2 2 water 28 3 3 nutri 36 4 4 nutri 29 5 0 nutri 45 6 1 mineral 55 7 2 mineral 29 8 3 mineral 56

import pandas as pd water = [25,30,28,24,34] nutri = [36,29,45,23,30,39,28] mineral = [55,29,56,20] df = pd.DataFrame() df['index'] = list(range(len(water)+len(nutri)+len(mineral))) df['treatment'] = list(water) + list(nutri) + list(mineral) df['treatment'] = df['treatment'].replace(water, "water") df['treatment'] = df['treatment'].replace(nutri, "nutri") df['treatment'] = df['treatment'].replace(mineral, "mineral") df

Python
View
generate functionWed, 08 Mar 2023

Write code that adds money +=1 on any number from the list of winnums winnums = [777, 999, 555, 333, 111, 177, 277, 377, 477, 577, 677, 877, 977, 100, 200, 300, 400, 500, 600, 700, 800, 900, 110, 120, 130, 140, 150, 160, 170, 180, 190, 210, 220, 230, 240, 250, 260, 270, 280, 290, 310, 320, 330, 340, 350, 360, 370, 380, 390, 410, 420, 430, 440, 450, 460, 470, 480, 490, 510, 520, 530, 540, 550, 560, 570, 580, 590, 610, 620, 630, 640, 650, 660, 670, 680, 690, 710, 720, 730, 740, 750, 760, 770, 780, 790, 810, 820, 830, 840, 850, 860, 870, 880, 890, 910, 920, 930, 940, 950, 960, 970, 980, 990, 107, 117, 127, 137, 147, 157, 167, 177, 187, 197, 207, 217, 227, 237, 247, 257, 267, 277, 287, 297, 307, 317, 327, 337, 347, 357, 367, 377, 387, 397, 407, 417, 427, 437, 447, 457, 467, 477, 487, 497, 507, 517, 527, 537, 547, 557, 567, 577, 587, 597, 607, 617, 627, 637, 647, 657, 667, 677, 687, 697, 707, 717, 727, 737, 747, 757, 767, 787, 797, 807, 817, 827, 837, 847, 857, 867, 877, 887, 897, 907, 917, 927, 937, 947, 957, 967, 977, 987, 997, 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090]

def add_money(winnums): for i, char in enumerate(winnums): winnums[i] += 1 return winnums add_money(winnums)

Python
View
generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function def how_much_water(L,X,N) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(max_load, load, clothes): return max_load * 1.1 ** (clothes - load)

Python
View
generate functionFri, 17 Mar 2023

There are two lists with numbers: sync = [85.1 83.8 69.9 82.1 84.4 80.4 78.1 88.4 77. 91.5 76.7 86.6 91.8 73.3 83.9 76.7 85.8 89.6 91.7 87.2 79. 85.3] asyncr = [89.8 81.6 87.4 81. 66.9 72.5 78.4 68.5 78.3 62.6 73.7 77.7 63. 77.5] Write code that calculates the similarity statistics of these samples, p-value

from scipy.stats import ttest_ind sync = [85.1, 83.8, 69.9, 82.1, 84.4, 80.4, 78.1, 88.4, 77. , 91.5, 76.7, 86.6, 91.8, 73.3, 83.9, 76.7, 85.8, 89.6, 91.7, 87.2, 79. , 85.3] asyncr = [89.8, 81.6, 87.4, 81., 66.9, 72.5, 78.4, 68.5, 78.3, 62.6, 73.7, 77.7, 63., 77.5] ttest_ind(sync, asyncr)

Python
View
generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = 'quantity' * portions print(grams) The result should be: Cheese: 210 g Tomatoes: 6 pcs. Cucumbers: 60g Olives: 30 g Olive oil: 60 ml Lettuce: 30 gr Pepper: 60 gr

def get_shop_list_by_dishes(dishes, person_count): shop_list = {} for dish in dishes: for ingridient in cook_book[dish]: new_shop_list_item = dict(ingridient) new_shop_list_item['quantity'] *= person_count if new_shop_list_item['ingridient_name'] not in shop_list: shop_list[new_shop_list_item['ingridient_name']] = new_shop_list_item else: shop_list[new_shop_list_item['ingridient_name']]['quantity'] +=\ new_shop_list_item['quantity'] return shop_list def print_shop_list(shop_list): for shop_list_item in shop_list.values(): print('{} {} {}'.format(shop_list_item['ingridient_name'], shop_list_item['quantity'], shop

Python
View
generate functionTue, 21 Feb 2023

There is a Pandas dataframe: Last Updated Installs 7479 2010-05-21 100000 7430 2011-01-30 50000 10282 2011-03-16 100000 8418 2011-04-11 5000000 8084 2011-04-16 50000 9067 2011-04-18 50000 5144 2011-05-12 100 7237 2011-06-23 1000 10460 2011-06-26 5000 1544 2011-06-29 1000000 7080 2011-07-10 5000000 8200 2011-09-20 50000 5561 2011-09-22 1000000 Write a function that will create a 'year' column, in which the first 4 characters of the 'Last Updated' column will be added

def year(df): df["year"] = df["Last Updated"].apply(lambda x: x[:4]) return df

Python
View
generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

import numpy as np import scipy.stats as st import seaborn as sns import matplotlib.pyplot as plt def difference_of_means(data_1, data_2): """Difference in means of two arrays.""" # The difference of means of data_1, data_2: diff diff = np.mean(data_1) - np.mean(data_2) return diff def permutation_sample(data1, data2): """Generate a permutation sample from two data sets.""" # Concatenate the data sets: data data = np.concatenate((data1, data2)) # Permute the concatenated array: permuted_data permuted_data = np.random.permutation(data) # Split the permuted array into two: perm_sample_1, perm_sample_2 perm_sample_1 = permuted_data[:len(data1)] perm_sample_2 = permuted_data[

Python
View
generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 155 [free, entry, wkly, comp, win, fa, cup, final,... The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] For index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['message'], 'actual': df.iloc[index]['label'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it

misclassified = [] For index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['message'], 'actual': df.iloc[index]['label'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python
View
generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def get_students(gr, dct=dct): students = [] for k, v in dct.items(): if gr == v[4]: students.append(v[0] + ' ' + v[1] + ' ' + v[2]) return sorted(students) get_students("BST161")

Python
View
generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def class_delete(df, Class1, Class2): df = df.loc[df['Class 1'] != Class1] df = df.loc[df['Class 2'] != Class2] return df df = class_delete(df, 'Rock', 'Rock')

Python
View
generate functionMon, 26 Dec 2022

There are two lists. The first (stnums) contains rows that are considered duplicates. The second list is a list of students. Write a function that replaces the values in the second list that match the first (but only the 2nd, 3rd value, and so on, not the first one). stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def replace(stnums, students): for i in stnums: for j in students: if i == j[0]: students.remove(j) return students

Python
View
generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

import pandas as pd data = pd.read_csv('pokemon.csv') def delete(data): for i in range(len(data)): if data['Class 1'][i] == 'Rock' or data['Class 2'][i] == 'Rock': data = data.drop([i]) return data delete(data)

Python
View
generate functionThu, 19 Jan 2023

There is a Pandas dataframe: place decade perc_of_5star 0 1 2010-2020 2.300 1 2 1900-1910 1.379 2 3 1970-1980 1.179 3 4 2000-2010 1.176 4 5 1960-1970 1.133 build a horizontal barchart with perc_of_5star columns in descending order and decade values on the y-axis

import pandas as pd import matplotlib.pyplot as plt import numpy as np df = pd.DataFrame({'place': [1, 2, 3, 4, 5], 'decade': ['2010-2020','1900-1910','1970-1980','2000-2010','1960-1970'], 'perc_of_5star': [2.300, 1.379, 1.179, 1.176, 1.133]}) plt.barh(df['decade'], df['perc_of_5star']) plt.xlabel('Percentage of 5 star reviews') plt.ylabel('Decades') plt.title('Percentage of 5 star reviews by decade') plt.show()

Python
View
generate functionThu, 23 Mar 2023

Write code that uses import os, import glob or other libraries to write all the text files at this url into one variable book_files book_files = ('http://stup.pro/wp-content/uploads/2023/03/*.txt') book_files

import os def read_files_in_directory(directory): for filename in os.listdir(directory): if not filename.endswith('.txt'): continue with open(filename, 'r') as f: lines = [l.strip() for l in f.readlines()] print(lines) return lines read_files_in_directory(r'C:\Users\USER\Desktop\Python\Python Basics')

Python
View
generate functionThu, 15 Dec 2022

Create a function that checks if there is a docnum value in the list with the dictionary, and if there is, deletes it, otherwise print Document not found docnum = '10006' documents = [ {'type': 'passport', 'number': '2207 876234', 'name': 'Василий Гупкин'}, {'type': 'invoice', 'number': '11-2', 'name': 'Геннадий Покемонов'}, {'type': 'insurance', 'number': '10006', 'name': 'Аристарх Павлов'} ]

def del_doc_by_num(doc_num): for document in documents: if document['number'] == doc_num: documents.remove(document) return print('Document not found') del_doc_by_num('10006')

Python
View
generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that converts dataframe columns to strings

def df_to_string(df): df.columns = df.columns.astype(str)

Python
View
generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def student_list(gr): lst = [] for i in dct.keys(): if gr == dct[i][-1]: lst.append(dct[i][:3]) lst.sort(key=lambda x: x[0]) lst.sort(key=lambda x: x[1]) return lst

Python
View
generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns hapscore = pd.DataFrame({'2015': [df15['Happiness Score'].mean()], '2016': [df16['Happiness Score'].mean()], '2017': [df17['Happiness.Score'].mean()], '2018': [df18['Score'].mean()], '2019': [df19['Score'].mean()]}) 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

import matplotlib.pyplot as plt def happinessgraph(): hapscore.plot(kind='line') plt.xlabel("Years") plt.ylabel("Happiness Score") plt.title("Graph of Happiness Score over the Years") plt.show()

Python
View
generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that finds the mean values of each column and transposes them to rows

def mean_values(df): return df.mean().to_frame().T

Python
View
generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 in this dataframe the names of films are not unique, because the same film may have been given different ratings Your task is to write a function that calculates the average rating of each movie and removes non-unique strings

pd.DataFrame({'id': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'title': ['Pulp Fiction (1994)', 'Three Colors: Red (Trois couleurs: Rouge) (1994)', 'Three Colors: Blue (Trois couleurs: Bleu) (1993)', 'Underground (1995)', "Singin' in the Rain (1952)", 'Dirty Dancing (1987)', 'Delicatessen (1991)', 'Ran (1985)', 'Seventh Seal, The (Sjunde inseglet, Det) (1957)', 'Bridge on the River Kwai, The (1957)'], 'rating': [5.0, 3.5, 5.0, 5.0, 3.5, 4.0, 3.5, 3.5, 5.0, 4.0]})

Python
View
generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of strings for all "operator"

df_new["operator"].str.len().mean()

Python
View
generate functionMon, 13 Mar 2023

There is a logistic linear discriminant model trained using the following formula: from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(data, data_class, random_state=42) from sklearn.discriminant_analysis import LinearDiscriminantAnalysis lda.fit(X_train, y_train) Write code to evaluate its quality with F1 measure

from sklearn.metrics import f1_score y_pred = lda.predict(X_test) f1_score(y_test, y_pred)

Python
View
generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def plot_bar(df, question, title_text, xlabel, ylabel, orientation='h'): question = df[question].value_counts() label = question.index counts = question.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text=title_text) fig.show() df = pd.read_csv('https://raw.githubusercontent.com/GODKarma/Data-Analytics-2020/master/Data/survey_results_public.csv', index_col='Respondent') plot_bar(df, 'LanguageWorkedWith', 'С какими языками программирования вы работали?', 'Количество', 'Языки')

Python
View
generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) leave only the rows in the year column with a value higher than 1980 2) calculates the average temperature of the countries (the average of the av_temp column) 3) builds a list of the 20 coldest countries in ascending av_temp order

def calc_av_temp(df): df = df[df['year']>1980] df = df.groupby('country').agg({'av_temp': np.mean}) return df.sort_values('av_temp').head(20)

Python
View

Questions about programming?Chat with your personal AI assistant