filter data in csv files

goals project: the client wants to filter some CSV rows from 22 files CSV , these rows come from one CSV which has names of the peoples who want to be eliminated.

All firs:t I thought to use PHP and MySQL to import the CSV and use SQL language to do this but  I found it some huge work also not the best way to reuse for future.

second: I thought to use BASH AWK to do it, it a very good way but, Bash array has more difficulty with a string which has space

in this I learned AWK best the tools bash


awk ‘/partten/ {print}’/etc/hosts

BASH array
allThreads=(1 2 4 8 16 32 64 128)

Looping through array elements

With that in mind, let’s loop through $allThreads and launch the pipeline for each value of --threads:

for t in ${allThreads[@]}do
./pipeline –threads $t

Looping through array indices

Next, let’s consider a slightly different approach. Rather than looping over array elements, we can loop over array indices:

for i in ${!allThreads[@]}do
./pipeline –threads ${allThreads[$i]}

I decided  to use Python , because its very permance also easy to reuse code is below:

import os
import glob
def clearcsv(csv,partten):
 patten = partten
 display = ''
 with open(csv,'r+',encoding='utf-8', errors="ignore") as f:
 data = ''.join([i for i in f if i.lower().find(patten.lower())==-1])

def getCsvfiles():
 return glob.glob("./csv/*.csv")

def getRemoveperson():
 f = open('remove.csv', "r+",encoding="utf-8",errors="ignore" )
 # use readlines to read all lines in the file
 # The variable "lines" is a list containing all lines in the file
 lines = f.readlines()
 # close the file after reading the lines.
 return lines

#clearcsv('20201020SD.csv','Jose A. Fernandes')
def process():
 files = getCsvfiles()
 removePoeples = getRemoveperson()
 for f in files:
 for r in removePoeples:
 removeP= r.replace('\n','')

def process2():
 removePoeples = getRemoveperson()
 for r in removePoeples:
 removeP= r.replace('\n','')