Duniyar fasaha tana ci gaba cikin sauri, kuma ษayan sabbin abubuwan da suka daษe suna jan hankali shine canza fayilolin .pdf zuwa sauti. Wannan na iya zama da amfani sosai don dalilai iri-iri, kamar kayan koyo, samun dama, ko kawai jin daษin littafi ko takarda ba tare da buฦatar allo ba. A cikin wannan labarin, za mu shiga cikin hanyar Python don wannan matsala kuma mu bayyana matakan da suka dace don ฦirฦirar rubutun aiki don canza fayilolin .pdf ษinku zuwa sauti. Bugu da ฦari, za mu tattauna wasu mahimman ษakunan karatu da ayyuka da ke cikin wannan tsari. Don haka, bari mu fara!
Maganin Python don Maida Fayilolin PDF zuwa Audio
Harshen shirye-shiryen Python yana ba da tarin ษakunan karatu da kayan aikin da ke ba masu haษaka damar yin ayyuka da yawa, gami da sauya fayil. ฦayan irin wannan ษakin karatu shine PDF2, wanda ke ba mu damar cire rubutu daga fayilolin .pdf. Don canza rubutun da aka ciro zuwa sauti, za mu iya amfani da wani ษakin karatu da ake kira gTTS (Google Rubutu-zuwa-Magana). Yana amfani da API ษin Rubutu-zuwa-Magana na Google don samar da fayil mai jiwuwa daga rubutu.
Anan ga bayanin mataki-mataki na lambar don canza fayil ษin .pdf zuwa fayil mai jiwuwa ta amfani da Python:
- Da farko, shigar da ษakunan karatu da ake buฦata ta aiwatar da umarni mai zuwa a cikin tashar ku ko umarni da sauri:
pip install PyPDF2 gtts
- Na gaba, shigo da dakunan karatu masu mahimmanci a farkon rubutun Python ษinku ta ฦara waษannan layukan:
import PyPDF2 from gtts import gTTS
- ฦirฦiri aiki don cire rubutu daga fayil ษin .pdf:
def extract_text_from_pdf(pdf_path): # Initialize the PdfFileReader object pdf_file = PyPDF2.PdfFileReader(pdf_path) # Extract text from each page full_text = "" for page_num in range(pdf_file.getNumPages()): text = pdf_file.getPage(page_num).extractText() full_text += text return full_text
- ฦirฦiri wani aiki don canza rubutun da aka ciro zuwa fayil mai jiwuwa:
def text_to_audio(text, output_audio_file): # Initialize the gTTS object tts = gTTS(text=text, lang='en', slow=False) # Save the audio file tts.save(output_audio_file)
- A ฦarshe, yi amfani da ayyukan don canza fayil ษin .pdf da kuke so zuwa mai jiwuwa:
pdf_file_path = "example.pdf" audio_output_file = "output_audio.mp3" extracted_text = extract_text_from_pdf(pdf_file_path) text_to_audio(extracted_text, audio_output_file)
Yanzu da muka rufe mahimman matakai don rubutun Python ษinmu, bari mu bincika wasu ษakunan karatu da ayyuka masu alaฦa.
Madadin PDF da Kayan aikin sarrafa Rubutu a cikin Python
Yayin da muke amfani da PyPDF2 da gTTS a cikin misalinmu, akwai wasu ษakunan karatu da ake samu a cikin yanayin yanayin Python don ayyuka iri ษaya:
- PDFMiner: Laburaren da aka tsara don fitar da bayanai daga fayilolin PDF, kamar rubutu, hotuna, metadata, har ma da samar da bayanai. Yana ba da ฦarin kayan aiki masu faษi don hakar rubutu da magudi fiye da PyPDF2.
- Rubutun rubutu: Laburaren da ke sauฦaฦa cire rubutu daga nau'ikan fayil daban-daban, gami da fayilolin PDF da Microsoft Office. Textract na iya zama babban madadin idan kuna buฦatar cire rubutu daga nau'ikan fayil da yawa.
- pyttsx3: Laburaren rubutu-zuwa-magana na layi-laburaren layi da giciye-dandamali don Python. Yayin da gTTS ya dogara da API na Google, pyttsx3 yana amfani da injin rubutu-zuwa-magana na tsarin ku, yana samar da ayyukan layi da fa'idodin sirri.
Waษannan hanyoyin za su iya samar da ฦarin fasali da zaษuษษuka, dangane da takamaiman buฦatun ku. Jin kyauta don ฦara bincika su kuma zaษi wanda ya fi dacewa da aikin ku.
A cikin wannan labarin, mun gabatar da wani bayani na Python don canza fayilolin .pdf zuwa sauti, mun bayyana matakan da ake buฦata don ฦirฦirar rubutun aiki, kuma mun tattauna ษakunan karatu da ayyuka daban-daban da suka shafi maganinmu. Ta bin waษannan jagororin da fahimtar ma'anar bayan lambar, zaka iya sauฦaฦe ilimin ku kuma daidaita wannan bayani don wasu tsarin fayil ko lokuta daban-daban na amfani. Murnar coding!