I’m trying to add some metrics to a pyspark pipeline feeding an index. I’ve found that the get operation blocks on the write, which I can workaround by adding in another request on the dataframe, such as the count below. Is a workaround like this necessary or am I missing something with this code?
obs = Observation("metrics")
df = spark.read.load(format="jdbc", **pgoptions) \
.withColumns(chpt_cols) \
.select(*target_cols) \
.observe(obs,
count(lit(1)).alias("metric1")
)
df.write.save(mode='append', format='es', **esoptions)
df.count()
obs.get
Thanks in advance for your replies