An warware: pandas baya cika bayan haɓakawa

A cikin duniyar yau, sarrafa bayanai da bincike suna da mahimmanci don fahimtar al'amura daban-daban da yanke shawara na gaskiya. Ɗaya daga cikin ayyuka na gama gari a cikin nazarin bayanai shine sake tsara bayanan jerin lokaci, wanda ya haɗa da canza mitar bayanai, ko dai ta hanyar haɓakawa (ƙara mitar) ko raguwa (rage yawan mita). A cikin wannan labarin, za mu tattauna tsarin ciko baya yayin haɓaka bayanan jerin lokaci ta amfani da ɗakin karatu mai ƙarfi na Python, Pandas.

Cika Bayanin Jigon Lokaci na Baya

Lokacin da muka haɓaka bayanan jeri na lokaci, muna ƙara mitar wuraren bayanan, wanda yawanci yana haifar da ɓacewar ƙima don sabbin wuraren bayanan da aka ƙirƙira. Don cike waɗannan ƙimar da suka ɓace, zamu iya amfani da hanyoyi daban-daban. Daya irin wannan hanya ake kira ciko baya, wanda aka sani da ita sake cikawa. Cika baya shine tsari na cika ƙimar da ta ɓace tare da ƙimar da ke gaba a cikin jerin lokaci.

Pandas Library

Python's Pandas library kayan aiki ne mai mahimmanci don sarrafa bayanai, yana ba da ayyuka masu yawa don sarrafa tsarin bayanai kamar DataFrames da bayanan jerin lokaci. Pandas yana da abubuwan ginannun abubuwan da ke sauƙaƙa yin aiki tare da bayanan jeri na lokaci, kamar sake samfuri da cika ƙimar da suka ɓace, yana ba mu damar aiwatar da cikawar baya da kyau bayan haɓakawa.

Magani: Cika Baya da Pandas

Don nuna tsarin aiwatar da cikawar baya bayan haɓaka bayanan jerin lokaci ta amfani da Pandas, bari mu ɗauki misali mai sauƙi. Za mu fara da shigo da laburaren da suka dace da ƙirƙirar bayanan jeri na lokaci.

import pandas as pd
import numpy as np

# Create a sample time series dataset
date_rng = pd.date_range(start='2022-01-01', end='2022-01-10', freq='D')
data = np.random.randint(0, 100, size=(len(date_rng), 1))

df = pd.DataFrame(date_rng, columns=['date'])
df['value'] = data

Yanzu da muna da bayanan samfurin mu, za mu ci gaba da haɓakawa da amfani da hanyar cike ta baya. A cikin wannan misali, za mu haɓaka daga mitar yau da kullun zuwa mitar sa'a guda:

# Upsample the data to hourly frequency
df.set_index('date', inplace=True)
hourly_df = df.resample('H').asfreq()

# Apply the backward fill method to fill missing values
hourly_df.fillna(method='bfill', inplace=True)

A cikin lambar da ke sama, mun fara saita ginshiƙi na 'kwanan wata' azaman maƙasudi sannan kuma muka sake daidaita bayanan zuwa mitar sa'a guda ta amfani da misali() aiki. Sakamakon DataFrame yana da ƙarancin ƙima saboda karuwar mitar. Sai muka yi amfani da cika () hanya tare da siga 'bfill' don aiwatar da ciko baya akan ƙimar da suka ɓace.

Bayanin mataki-mataki

Bari mu karya lambar don fahimtar ta da kyau:

1. Mun fara shigo da dakunan karatu na Pandas da NumPy:

   import pandas as pd
   import numpy as np
   

2. Mun halicci samfurin lokaci jerin bayanai ta amfani da date_range() aiki daga Pandas don samar da ranakun yau da kullun da ƙimar ƙima:

   date_rng = pd.date_range(start='2022-01-01', end='2022-01-10', freq='D')
   data = np.random.randint(0, 100, size=(len(date_rng), 1))
   df = pd.DataFrame(date_rng, columns=['date'])
   df['value'] = data
   

3. Na gaba, mun saita ginshiƙi 'kwanakin' a matsayin maƙasudin kuma mun sake gwada bayanan zuwa mitar sa'a guda tare da misali() da kuma asfreq() ayyuka:

   df.set_index('date', inplace=True)
   hourly_df = df.resample('H').asfreq()
   

4. A ƙarshe, mun cika ƙimar da aka ɓace a cikin DataFrame da aka ɗauka ta amfani da cika () hanya tare da ma'aunin 'bfill' don cika baya:

   hourly_df.fillna(method='bfill', inplace=True)
   

Kammalawa

A cikin wannan labarin, mun bincika tsari na cikon baya bayan haɓaka bayanan jerin lokaci amfani da babban ɗakin karatu na Pandas a Python. Ta hanyar fahimta da aiwatar da waɗannan fasahohin, za mu iya sarrafa yadda ya kamata da kuma yin nazarin bayanan jeri-jerun lokaci, gano mahimman bayanai da kuma yanke shawara masu inganci.

Shafi posts:

Leave a Comment