We could see unexpected behaviour of python logging in databricks

Category: azure data lake

Question

Mallaiah somula on Mon, 10 Jun 2019 08:05:49


We could see unexpected behaviour of python logging in databricks. When we use the mode='a' its adding the log message multiple times and when i use the mode='w', its appending the logs to the file.

Issues: Modes are not working

When i execute my piece of code 2 times with mode='a':

Expected :

test - this is info -INFO

test - this is info -INFO

databricks behaviour:

test - this is info -INFO

test - this is info -INFO

test - this is info -INFO

When i execute my piece of code 2 times with mode='w':

Expected :

test - this is info -INFO

databricks behaviour:

test - this is info -INFO

test - this is info -INFO

Second issue:

the file which we have created in the data lake from the databricks and when we delete it in the data lake, when we have check whether file exists in the particular path or not , its saying file exists.

used code:

import os

os.path.exists('filepath')



Replies

MartinJaffer-MSFT on Mon, 10 Jun 2019 19:05:56


Hello Mallaiah somula.  Could you please clarify whether you are using Data Lake Storage Gen1 or Data Lake Storage Gen2 ?

Also, what version of the Databricks Runtime and which Python version are you using?

Mallaiah somula on Tue, 11 Jun 2019 05:25:52


We are using the azure data lake gen2

Databricks Runtime : 5.3(Includes spark 2.4.0 and scala 2.11)

Python version : 3

Mallaiah somula on Wed, 12 Jun 2019 07:31:45


One more issue which we have currently is that, We are not able to execute below lines of code on the databricks cluster.

df=spark.createDataFrame([(1,'rama'),(2,'krishna')],['id','name'])
df.show()

I have tried multiplt times and created multiple work spaces in azure portal and multiple cluster and cluster is running file but we are not able to execute.
we are using same version as mentioned below:
Python version : 3
Spark runtime : 5.3


MartinJaffer-MSFT on Wed, 12 Jun 2019 18:47:32


Are you using spark calls to write to the file?  That would explain why I was unable to even open a file with python open("/mnt/myDataLake2/myContainer/myFile.txt","w")

Mallaiah somula on Thu, 13 Jun 2019 06:03:19


1. We are writing the code in the databricks notebook and these notebooks are going to call from the ADF.

2. We are trying to read the file or write the file, not able to by using below line in the notebook cell

with open('/mnt/myDataLake2/myContainer/myFile.txt','r') as file:

       file.read()

until we specify the file path with 'dbfs' like  '/dbfs/mnt/myDataLake2/myContainer/myFile.txt'. when we delete the file from the data lake , its still showing file exists when we check it form the below code

import os

os.path.exists("/mnt/myDataLake2/myContainer/myFile.txt")

Please let us know if you need anymore details.

MartinJaffer-MSFT on Sat, 15 Jun 2019 02:43:48


Sorry, for the delay.  I just got past my blocker, seems my storage got into a weird state or something.  Maybe permissions.  Anyway, now this is what I got.


The cluster was Running 42GB, 5.3 (includes Apache Spark 2.4.0, Scala 2.11)

MartinJaffer-MSFT on Sat, 15 Jun 2019 02:50:26


Make sure you don't have any blobs mixed in with your ADSLgen2 files.  Previously, it was possible to write to your storage with either 'flavor'.  Does not seem to be as easy to do any more, if possible.  Data Lake methods don't play nice with blobs.

After some more thought, have you checked permissions on the parent folders?

Also, it might be worth trying "dbutils.fs.refreshMounts()"

Mallaiah somula on Tue, 18 Jun 2019 06:36:58


We have tried all the possible options. We are still got strucked. Let us know next steps

HimanshuSinha-msft on Tue, 18 Jun 2019 20:24:11


Sorry to know that the issue is still on going .

I see that you have multiple issues 

  1. Append & Write issues - We see that the Martin did tried to repro the issue and it works for fine for him . He has also put script which works for him . Can you please share the scripts which you are running ? 
  2. Data frame issues : Is this issue resolved ? Can you please share the error message please .
  3. Files not getting deleted : I think Martin's script does checks this also . 

Which region is your databricks running on ?

Mallaiah somula on Thu, 20 Jun 2019 05:12:31


The entire code i have used:

import logging
from datetime import datetime
import os

def custom_logger(path,filename):
  d=datetime.now()
  year_value=str(d)[0:4]
  month_value=str(d)[5:7]
  day_value=str(d)[8:10]
  hours_val=str(d)[11:13]
  min_val=str(d)[14:16]
  sec_val=str(d)[17:19]
  logger=logging.getLogger(filename)
  logger.setLevel(logging.DEBUG)
  filename1="./"+path+'/'+filename+'/'
  print(filename1)
  if not os.path.exists(filename1):
    print("creating the directories")
    os.makedirs(filename1)
  filename2=filename1+'logs_'+year_value+month_value+day_value+'2.txt'
  print(filename2)
  filehandler=logging.FileHandler(filename2,'a')
  formater=logging.Formatter('%(asctime)s - %(name)s - %(message)s -%(levelname)s')
  filehandler.setFormatter(formater)
  filehandler.setLevel(logging.DEBUG)
  logger.addHandler(filehandler)
  return logger

Python : when we use the mode 'a', it will append the data and mode 'w' then it will overwrite the file content

Used Code:

filehandler=logging.FileHandler('path','a')

filehandler=logging.FileHandler('path','w')

Databricks: For the file which is existing , when we used mode a' then its updating the data eg : if you execute the same object repetedly, file is updating the same statement in this pattern 1,3,6,12..

when we use mode 'w' , instead of overwriting its appending the content and its updating the previous logs timestamp to the latest timestamp  

Python : When we use this statement in python, it should create file in the given path if the file doesn’t exist.

Used Code:

import logging

filehandler=logging.FileHandler('path','a')

Databricks : The above code is not working in databricks.i.e., file is not getting created.

Our Cluster is running in "uksouth"

ChristianB7 on Wed, 16 Oct 2019 13:54:36


Hi Mallaiah,

I came across this after trying to use logging in databricks also. It still seems that the logging.FileHandler fails when creating the file or doesn't fail but also doesn't output anything to the file.  Did you maange to get a fully working version where we could call something like logger.INFO('my message')

Thanks, Chris