I have a dataset called Auto.csv, which has the form:
mpg cylinders displacement horsepower weight acceleration year origin name
18 8 307 130 3504 12 70 1 chevrolet chevelle malibu
15 8 350 165 3693 11.5 70 1 buick skylark 320
18 8 318 150 3436 11 70 1 plymouth satellite
16 8 304 150 3433 12 70 1 amc rebel sst
17 8 302 140 3449 10.5 70 1 ford torino
15 8 429 198 4341 10 70 1 ford galaxie 500
14 8 454 220 4354 9 70 1 chevrolet impala
14 8 440 215 4312 8.5 70 1 plymouth fury iii
14 8 455 225 4425 10 70 1 pontiac catalina
? 8 390 190 3850 8.5 70 1 amc ambassador dpl
15 8 383 170 3563 10 70 1 dodge challenger se
14 8 340 160 3609 8 70 1 plymouth 'cuda 340
15 8 400 150 3761 9.5 70 1 chevrolet monte carlo
14 8 455 225 3086 10 70 1 buick estate wagon (sw)
24 4 113 95 2372 15 70 3 toyota corona mark ii
22 6 198 95 2833 15.5 70 1 plymouth duster
18 6 199 97 2774 15.5 70 1 amc hornet
21 6 200 85 2587 16 70 1 ford maverick
Imagine that the dataset has 1million lines, and possibly somewhere there is a character in place of a number, that is, by an error, somewhere in the column mpg
that should contain only numbers there is a strange character like '?'
import pandas as pd
import numpy as np
carros = pd.read_csv('Auto.csv',sep =',')
print(carros['mpg'])
How can I test if the mpg
column contains only numbers, and if not, return a new Pandas dataframe with the row removed?
Dataset source: link