QnaList > Groups > Tika-User > Mar 2016
faq

File-related Metadata

Hello,
I'm having an issue where I'm getting back two or three metadata properties
that are related to a temp file that tika is apparently creating under the
hood:
File Modified Date (the current date)
File Name (temp file name: apache-tika-3021300783416279997.tmp)
File Size
I assume this is because I only have a stream to give Tika and no longer
have a physical file.  However the users are seeing these (particularly the
modified date) and misinterpreting it.
I'd like to exclude these, which I could of course do by just a
string-based filter.  However that feels a little hackish... I was hoping
there may be some way to deactivate file metadata if Tika is the one that
created the temp file?  I tried to find the spot in Tika where these are
being added by greping all the source but I seem to have come up empty for
some reason.
Thanks for any pointers,
Brian

asked Mar 25 2016 at 14:07

Brian Young 's gravatar image



1 Replies for : File-related Metadata
So, after some debugging I discovered the root cause.  The image metadata
extractor is producing properties for "file modified date", "file name" and
"file size."
Unfortunately as mentioned in my original post, the file information is at
times misleading since it can reflect a tika temp file.   However I'm not
sure there is a clean or painless way to exclude these from the
ImageMetadataExtractor without basically replacing that parser with a
subclass?
So, the easiest fix might be for me to remove these properties as a post
operation after the metadata is extracted.  Not as clean as I would like
but it should work.
Thanks,
Brian

answered Mar 28 2016 at 13:13

Brian Young 's gravatar image


Related discussions

Tagged

Group Tika-user

asked Mar 25 2016 at 14:07

active Mar 28 2016 at 13:13

posts:2

users:1

©2013 QnaList.com . QnaList is part of ZisaTechnologies LLC.